<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Speculative Decoding on MacWorks</title><link>https://macworks.dev/tags/speculative-decoding/</link><description>Recent content in Speculative Decoding on MacWorks</description><generator>Hugo</generator><language>en</language><atom:link href="https://macworks.dev/tags/speculative-decoding/index.xml" rel="self" type="application/rss+xml"/><item><title>2026-05-10</title><link>https://macworks.dev/docs/archives/ai_reddit/ai-reddit-2026-05-10/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/archives/ai_reddit/ai-reddit-2026-05-10/</guid><description>&lt;details&gt;
&lt;summary&gt;Sources&lt;/summary&gt;
&lt;div class="markdown-inner"&gt;
&lt;ul&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/aipromptprogramming/.rss"&gt;r/AIPromptProgramming&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/chatgpt/.rss"&gt;r/ChatGPT&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/chatgptcoding/.rss"&gt;r/ChatGPTCoding&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/claudeai/.rss"&gt;r/ClaudeAI&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/cline/.rss"&gt;r/Cline&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/githubcopilot/.rss"&gt;r/GithubCopilot&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/localllama/.rss"&gt;r/LocalLLaMA&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/mcp/.rss"&gt;r/MCP&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/notebooklm/.rss"&gt;r/NotebookLM&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/OpenAI/.rss"&gt;r/OpenAI&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/PromptEngineering/.rss"&gt;r/PromptEngineering&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/roocode/.rss"&gt;r/RooCode&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/singularity/.rss"&gt;r/Singularity&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/stablediffusion/.rss"&gt;r/StableDiffusion&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;
&lt;/div&gt;
&lt;/details&gt;


&lt;h1 id="ai-reddit--2026-05-10"&gt;AI Reddit — 2026-05-10&lt;a class="anchor" href="#ai-reddit--2026-05-10"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="the-buzz"&gt;The Buzz&lt;a class="anchor" href="#the-buzz"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The most critical discovery today is a massive, systematical benchmark of Speculative Decoding (MTP) quants that fundamentally changes how we should be configuring local inference. A user ran over 300 tests on Qwen 3.6 27B and proved that MTP nearly triples token generation speeds for coding tasks (with an 89% draft acceptance rate), but actively slows down creative writing and narrative generation (dropping below 40% acceptance). Because memory bandwidth dictates the benefit of speculative decoding, users are realizing they need to toggle MTP dynamically based on the exact nature of their prompt, rather than treating it as a global speedup.&lt;/p&gt;</description></item><item><title>AI Reddit</title><link>https://macworks.dev/docs/week/ai_reddit/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/week/ai_reddit/</guid><description>&lt;h1 id="ai-reddit--week-of-2026-05-08-to-2026-05-15"&gt;AI Reddit — Week of 2026-05-08 to 2026-05-15&lt;a class="anchor" href="#ai-reddit--week-of-2026-05-08-to-2026-05-15"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="the-buzz"&gt;The Buzz&lt;a class="anchor" href="#the-buzz"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The AI subsidy era abruptly ended this week as a dual billing shockwave from GitHub and Anthropic fundamentally altered the agentic landscape. Copilot&amp;rsquo;s shift to usage-based billing triggered a mass exodus as developers stared down projected monthly invoices exceeding $1,000, while Anthropic simultaneously cracked down on unlimited background loops for Claude Code by moving it to a metered SDK credit. Amidst this financial panic, the open-source community rallied, notably transitioning the beloved but defunct Roo extension into a community-maintained fork called &lt;a href="https://macworks.dev/posts/zoo-is-the-new-roo"&gt;Zoo is the new Roo&lt;/a&gt;. The broader architectural conversation has shifted away from raw context window sizes toward solving the Model Context Protocol (MCP) &amp;ldquo;Context Tax&amp;rdquo; through lazy-loading middleware and semantic tool discovery, actively preventing agents from drowning in their own bloated schemas.&lt;/p&gt;</description></item></channel></rss>