<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Local Inference on MacWorks</title><link>https://macworks.dev/tags/local-inference/</link><description>Recent content in Local Inference on MacWorks</description><generator>Hugo</generator><language>en</language><atom:link href="https://macworks.dev/tags/local-inference/index.xml" rel="self" type="application/rss+xml"/><item><title>Engineer Reads</title><link>https://macworks.dev/docs/week/blogs/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/week/blogs/</guid><description>&lt;h1 id="engineering-reads--week-of-2026-05-07-to-2026-05-15"&gt;Engineering Reads — Week of 2026-05-07 to 2026-05-15&lt;a class="anchor" href="#engineering-reads--week-of-2026-05-07-to-2026-05-15"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="week-in-review"&gt;Week in Review&lt;a class="anchor" href="#week-in-review"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This week’s engineering discourse reflects a mature industry grappling with system boundaries and human intent. From constraining unpredictable AI integrations into strictly bounded functional workflows to leveraging organizational psychology to structure open-source compiler architecture, practitioners are aggressively reclaiming control over non-determinism. We are seeing a distinct pushback against buzzword-driven hype in favor of operational stability, rigorous domain modeling, and trusting native web standards over heavyweight abstractions.&lt;/p&gt;</description></item><item><title>2026-05-14</title><link>https://macworks.dev/docs/week/blogs/engineer-blogs-2026-05-14/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/week/blogs/engineer-blogs-2026-05-14/</guid><description>&lt;h1 id="engineering-reads--2026-05-14"&gt;Engineering Reads — 2026-05-14&lt;a class="anchor" href="#engineering-reads--2026-05-14"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="the-big-idea"&gt;The Big Idea&lt;a class="anchor" href="#the-big-idea"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The integration of AI into software engineering requires a deliberate architecture of boundaries—treating LLMs as predictable functions rather than autonomous agents, preserving human review for skill growth, and aggressively isolating non-determinism across our systems.&lt;/p&gt;
&lt;h2 id="deep-reads"&gt;Deep Reads&lt;a class="anchor" href="#deep-reads"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://martinfowler.com/bliki/InterrogatoryLLM.html"&gt;Bliki: Interrogatory LLM&lt;/a&gt;&lt;/strong&gt; · Martin Fowler
Fowler proposes using LLMs to reverse the standard prompting dynamic: instead of feeding the model context, prompt the LLM to interview a human expert one question at a time to build context. This approach can generate comprehensive design documents or verify existing complex specifications by extracting information from stakeholders who find writing difficult. The resulting text may bear the distinct cadence of AI generation, but capturing the raw domain knowledge outweighs stylistic drawbacks. This is a pragmatic read for technical leads and product managers struggling to pull coherent specifications out of stakeholders&amp;rsquo; heads.&lt;/p&gt;</description></item><item><title>AI Reddit</title><link>https://macworks.dev/docs/today/ai-reddit-2026-05-20/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/today/ai-reddit-2026-05-20/</guid><description>&lt;details&gt;
&lt;summary&gt;Sources&lt;/summary&gt;
&lt;div class="markdown-inner"&gt;
&lt;ul&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/aipromptprogramming/.rss"&gt;r/AIPromptProgramming&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/chatgpt/.rss"&gt;r/ChatGPT&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/chatgptcoding/.rss"&gt;r/ChatGPTCoding&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/claudeai/.rss"&gt;r/ClaudeAI&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/cline/.rss"&gt;r/Cline&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/githubcopilot/.rss"&gt;r/GithubCopilot&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/localllama/.rss"&gt;r/LocalLLaMA&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/mcp/.rss"&gt;r/MCP&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/notebooklm/.rss"&gt;r/NotebookLM&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/OpenAI/.rss"&gt;r/OpenAI&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/PromptEngineering/.rss"&gt;r/PromptEngineering&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/roocode/.rss"&gt;r/RooCode&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/singularity/.rss"&gt;r/Singularity&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.reddit.com/r/stablediffusion/.rss"&gt;r/StableDiffusion&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;
&lt;/div&gt;
&lt;/details&gt;


&lt;h1 id="ai-reddit--2026-05-20"&gt;AI Reddit — 2026-05-20&lt;a class="anchor" href="#ai-reddit--2026-05-20"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="the-buzz"&gt;The Buzz&lt;a class="anchor" href="#the-buzz"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The biggest shockwave today is a severe reality check on AI API and subscription pricing. GitHub Copilot&amp;rsquo;s new token-based billing has users staring at 10x cost increases, while Google&amp;rsquo;s new Gemini 3.5 Flash is inexplicably priced 14x higher than its predecessor, completely abandoning the &amp;ldquo;cheap and fast&amp;rdquo; ethos. As developers scramble to cancel bloated subscription stacks, the contrasting triumph of a user running DeepSeek-V4-Flash locally on a $2,500 rig of legacy RTX 2080 Tis perfectly captures the community&amp;rsquo;s sudden, aggressive pivot toward cost-control and hardware independence.&lt;/p&gt;</description></item></channel></rss>