Sources

AI Reddit — 2026-05-07#

The Buzz#

The community is in full revolt against GitHub Copilot’s new request-based pricing limits, triggering a mass exodus toward Claude Code and local alternatives. Meanwhile, Anthropic’s new Opus 4.7 is blowing minds for agentic workflows, but users are discovering its safety classifiers are dialed up so high that it refuses to analyze basic cybersecurity repos or discuss virology.

What People Are Building & Using#

The Model Context Protocol (MCP) ecosystem is rapidly maturing to handle local memory constraints. Builders are sharing tools like mcprt, an on-demand supervisor that slashes idle server RAM from 1.5GB to 16MB, saving crucial headroom for local models. We also saw the release of harshal-mcp-proxy on npm to consolidate multiple server configs into a single daemon. On the application side, one user shipped a full physics pinball game using LittleJS and Claude, leveraging the game’s physics geometry as image prompts for the table art. Also, a severe warning: a malicious HuggingFace model named Open-OSS/privacy-filter is actively distributing Windows malware, so verify your pulls before running them locally.

Models & Benchmarks#

Multi-Token Prediction (MTP) speculative decoding is the current obsession in local LLM inference. Users running the NextN MTP implementation in llama.cpp report roughly 3x decode speedups on the Qwen 3.6 family with zero quality loss. The Qwen 3.6 35B-A3B MoE architecture is hitting over 150 tokens per second on a single RTX 3090 Ti when paired with Q8 KV caching and MTP. There’s also intense debate around the new startup Subquadratic, which claims to have broken LLM scaling limits for a 1000x cost reduction, though the community remains deeply skeptical until independent peer reviews emerge. Finally, OpenAI officially launched GPT-Realtime-2 alongside new live translation and Whisper models.

Coding Assistants & Agents#

GitHub Copilot’s recent switch to request-based pricing has enraged its user base, with users hitting limits in a few hours and migrating en masse to Claude Code and Opencode. However, the reality of agentic coding is setting in: developers note that while feature generation is incredibly fast, maintenance and integration are becoming brutal bottlenecks. People are finding themselves staring at 800-line alien functions generated days prior, sparking a realization that faster code generation requires stricter automated testing and robust CI/CD pipelines, not less.

Image & Video Generation#

Local video generation has found a definitive champion in LTX 2.3. The community is abandoning closed-source APIs for distilled GGUF versions of LTX 2.3 running in ComfyUI, using prompt relay nodes and audio sync to create cohesive, story-driven narratives without temporal flickering. For workflow organization, the CleanFreak custom node for ComfyUI is gaining massive traction, automatically tidying spaghetti graphs of 1200+ nodes by role (loaders, encoders, samplers) without breaking connections.

Community Pulse#

The overarching sentiment today is a mix of awe and operational fatigue. We’ve mastered the art of building the initial 80% of an app in an afternoon, but users are increasingly overwhelmed by the “polish tail” and the cognitive load of debugging AI-generated legacy code. Additionally, the prompt engineering meta has shifted radically with OpenAI’s new GPT-5.5 guidance: the era of step-by-step “Chain of Thought” micromanagement is over, replaced by outcome-first prompting that lets the model’s native reasoning engine find its own efficient path.