Sources

AI Reddit — 2026-05-20#

The Buzz#

The biggest shockwave today is a severe reality check on AI API and subscription pricing. GitHub Copilot’s new token-based billing has users staring at 10x cost increases, while Google’s new Gemini 3.5 Flash is inexplicably priced 14x higher than its predecessor, completely abandoning the “cheap and fast” ethos. As developers scramble to cancel bloated subscription stacks, the contrasting triumph of a user running DeepSeek-V4-Flash locally on a $2,500 rig of legacy RTX 2080 Tis perfectly captures the community’s sudden, aggressive pivot toward cost-control and hardware independence.

What People Are Building & Using#

Over on r/LocalLLaMA, one user tackled the frustration of failed ML checkpoint saves by building smoltorrent, a distributed checkpoint sharding system over raw TCP that runs smoothly across a Mac Mini and Raspberry Pis. The Model Context Protocol (MCP) ecosystem continues to explode; notably, the new Agent Room open-source server allows isolated agents like Claude Code and Cursor to share a chat room and collaborate asynchronously without constant human copy-pasting. For those struggling with generic NotebookLM outputs, a user shared N.A.G. (Narrative Anchor & Guide), a clever Claude Skill that acts as a structured prompting engine to force citable, framework-driven slide decks and explainers out of the tool.

Models & Benchmarks#

A rigorous community benchmark of Qwen3.6-35B MoE on a 16GB RTX 5080 revealed that Multi-Token Prediction (MTP) actually degrades performance at massive 128k context lengths because the required memory buffer forces expert layers onto the slow CPU. Without MTP, the model achieves a highly respectable 56 tok/s generation speed and handles 131k context flawlessly. Meanwhile, OpenAI announced that a general-purpose reasoning model autonomously solved an 80-year-old math problem regarding planar unit distance posed by Paul Erdős, marking a massive milestone in Level 4 AI research. Gemini 3.5 Flash also hit the boards today, scoring 76.7% on SimpleBench and 1479 on the Debate Benchmark, though users remain heavily divided on whether its capabilities justify the new premium price tag.

Coding Assistants & Agents#

The mood regarding GitHub Copilot is mutinous, with developers actively migrating to Cursor and OpenCode after seeing their projected monthly bills jump from $39 to nearly $387 for standard studio usage. For those deep into agentic workflows, a veteran user noted that the real bottleneck in tools like Claude Code is now the human blindly watching the terminal; the fix is to stop observing the generation and start parallelizing multiple agents simultaneously. On the prompting front, a highly insightful post argued that inconsistent code generation isn’t a model flaw but a chat interface issue, advocating for strict sequential execution schemas instead of open-ended conversational requests to guarantee typed, parseable outputs.

Image & Video Generation#

The highly-capable Anima base model is frustrating users with its chaotic stylistic shifts and over-creativity; the community discovered that wrapping artist tags in blocks, increasing the shift value, and strictly utilizing the @ operator drastically stabilizes the output. We also saw the official ComfyUI release of AsymFLUX.2 klein, a pixel-space asymmetric flow adapter that generates highly realistic images in the Oklab color space without requiring a VAE. For those hitting WSL freezes during massive video or image upscales, a new “Safe Chunked Image Blend” node was released to explicitly handle CUDA resizing and prevent silent, memory-choking CPU offloads during batch processing.

Community Pulse#

Subscription fatigue has officially hit a breaking point, with users proudly canceling their $60+ monthly stacks (ChatGPT Plus, Gemini Advanced, Copilot) after realizing free tiers cover 90% of their needs, retaining only Claude Pro for deep context work. However, even Claude loyalists are furious about increasingly aggressive 11 AM message limits and silent model downgrades. Finally, there is a growing, uneasy consensus among professionals that non-technical executives are suffering from a dangerous “AI Trust Gap,” blindly executing raw ChatGPT summaries to drive company policy without verifying a single hallucination.