Sources

AI Reddit — 2026-05-23#

The Buzz#

The community is in an absolute uproar over GitHub Copilot’s upcoming usage-based billing changes. Users simulating their June costs are seeing their standard $39/month Pro+ subscriptions skyrocket to over $900/month for the exact same usage patterns. Unsurprisingly, this pricing shock has triggered an immediate exodus toward alternatives like Cursor and Gemini Code Assist.

What People Are Building & Using#

The Model Context Protocol (MCP) ecosystem is maturing rapidly, but developers are hitting a wall with context bloat when connecting too many servers at once. To solve this, builders are moving away from loading all tools simultaneously, instead creating dynamic tool discovery layers like mcp-assistant.in and file-system-style routers that only expose capabilities to the model when explicitly needed. We’re also seeing a massive push for cross-tool continuity, with projects like second-brain and MarsNMe deploying shared Supabase and pgvector memory layers so Claude Desktop, Cursor, and ChatGPT can finally remember your past sessions. For browser automation, the chromeflow.run extension emerged after 500 hours of development to pierce closed shadow DOMs and generate human-like CDP mouse trajectories, successfully surviving hardcore anti-bot screens on sites like Stripe and Instagram.

Models & Benchmarks#

The standout performance benchmark today belongs to Qwen3.6-35B-A3B MTP, which shockingly hit 249 tokens per second on a laptop-class RTX 5090M. Because the 35B model uses Mixture of Experts (routing only ~3B parameters per forward pass) combined with high-acceptance speculative decoding, it runs nearly 3.4x faster than its smaller 27B dense counterpart on the exact same hardware. Elsewhere, a fascinating head-to-head CPU benchmark between Needle 26M and Qwen3-0.6B for tool calling revealed that while Needle is 4.4x faster, it fails by aggressively selecting the wrong tools, whereas Qwen3 simply forgets to emit XML tags and answers in prose.

Coding Assistants & Agents#

Practitioners are finally realizing that writing agentic workflows is fundamentally different from chatbot prompting; it requires designing a strict process rather than just describing an output. A recurring failure mode is neglecting a hard “stop condition,” which causes unbounded execution loops and runaway API costs. To combat this exact issue, one developer released AI-CostGuard, a TypeScript execution firewall that detects infinite retry loops and triggers a kill switch before your wallet drains. For VS Code users tired of their coding sessions turning into noisy, amnesiac messes after a few hours, the new Witness Agent extension creates a local .witness/ folder to maintain structured project state, context packets, and subagent handovers.

Image & Video Generation#

ComfyUI users are getting native support for Microsoft Lens via a new pull request, bringing clean JSON prompt parsing and multiple supported aspect ratio resolutions. For high-end upscaling, the community vibecoded the Flux2.Klein Tile Upscaler Node, which dynamically reduces denoiser steps in low-detail areas like skies to save VRAM and render time. Meanwhile, the open-source SmartGallery DAM introduced a highly requested “Remix” workflow that allows users to tweak prompts and regenerate ComfyUI outputs directly from a web gallery without ever touching the node canvas.

Community Pulse#

The engineering community is shifting its mindset from simple “prompt engineering” to “reasoning systems engineering,” acknowledging that LLM failures are rarely due to poor wording, but rather structural instability like context rot and constraint decay. When it comes to writing those prompts, users are finding that example-heavy prompting crushes instruction-heavy prompting, noting that one strong example sets formatting and tone better than 200 words of meticulous style guidelines. There is also a growing appetite for breaking the standard request-response loop, driving experiments with asynchronous, bidirectional tool protocols where an agent and a running tool can converse live in the background without blocking the user.