<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ml Infrastructure on MacWorks</title><link>https://macworks.dev/tags/ml-infrastructure/</link><description>Recent content in Ml Infrastructure on MacWorks</description><generator>Hugo</generator><language>en</language><atom:link href="https://macworks.dev/tags/ml-infrastructure/index.xml" rel="self" type="application/rss+xml"/><item><title>Week 19 Summary</title><link>https://macworks.dev/docs/month/tech/weekly-2026-W19/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/month/tech/weekly-2026-W19/</guid><description>&lt;h1 id="engineering--scale--week-of-2026-04-18-to-2026-05-01"&gt;Engineering @ Scale — Week of 2026-04-18 to 2026-05-01&lt;a class="anchor" href="#engineering--scale--week-of-2026-04-18-to-2026-05-01"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="week-in-review"&gt;Week in Review&lt;a class="anchor" href="#week-in-review"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The dominant engineering theme this week is the maturation of AI integrations, shifting from black-box endpoints to highly governed, deterministic pipelines. Organizations are heavily prioritizing architectural decoupling—stripping metadata from data payloads to crush latency, and embedding infrastructure directly into application runtimes to avoid cross-network orchestration bottlenecks.&lt;/p&gt;
&lt;h2 id="top-stories"&gt;Top Stories&lt;a class="anchor" href="#top-stories"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;[Offline Generation &amp;amp; Deterministic AI Pipelines]&lt;/strong&gt; · Amazon &amp;amp; Sun Finance · &lt;a href="#"&gt;Source&lt;/a&gt;
Instead of exposing massive LLMs on the production critical path, Amazon utilized an OPT-175B model purely for offline synthetic data generation to instruction-tune a faster, smaller model (COSMO-LM) for real-time serving. Similarly, Sun Finance bypassed Claude&amp;rsquo;s PII safety throttles by delegating raw document extraction to a deterministic OCR layer (Textract), restricting the LLM strictly to JSON structuring. This highlights a growing mandate to use frontier models as offline data-synthesizers or constrained formatting nodes rather than monolithic runtime engines.&lt;/p&gt;</description></item><item><title>2026-05-01</title><link>https://macworks.dev/docs/archives/tech/tech-2026-05-01/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://macworks.dev/docs/archives/tech/tech-2026-05-01/</guid><description>&lt;details&gt;
&lt;summary&gt;Sources&lt;/summary&gt;
&lt;div class="markdown-inner"&gt;
&lt;ul&gt;

&lt;li&gt;&lt;a href="https://medium.com/feed/airbnb-engineering"&gt;Airbnb Engineering&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/feed/"&gt;Amazon AWS AI Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://aws.amazon.com/cn/blogs/architecture/feed/"&gt;AWS Architecture Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/opensource/feed/"&gt;AWS Open Source Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://brett.trpstra.net/brettterpstra"&gt;BrettTerpstra.com&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://blog.bytebytego.com/feed"&gt;ByteByteGo&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://blog.cloudflare.com/rss/"&gt;CloudFlare&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://dropbox.tech/feed"&gt;Dropbox Tech Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://engineering.fb.com/feed/"&gt;Facebook Code&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://github.blog/engineering.atom"&gt;GitHub Engineering&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://blog.google/innovation-and-ai/technology/ai/rss/"&gt;Google AI Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://deepmind.google/blog/rss.xml"&gt;Google DeepMind&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://feeds.feedburner.com/GoogleOpenSourceBlog"&gt;Google Open Source Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.hashicorp.com/blog/feed.xml"&gt;HashiCorp Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://feed.infoq.com/?token=XQ47eEiAJqUtN8043NhEqJ6kZB8XallO"&gt;InfoQ&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://engineering.atspotify.com/feed/"&gt;Spotify Engineering&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://www.microsoft.com/en-us/research/feed/"&gt;Microsoft Research&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://hacks.mozilla.org/feed/"&gt;Mozilla Hacks&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://netflixtechblog.com/feed"&gt;Netflix Tech Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://feeds.feedburner.com/nvidiablog"&gt;NVIDIA Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://feeds.feedburner.com/oreilly/radar/atom"&gt;O&amp;#39;Reilly Radar&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://openai.com/news/rss.xml"&gt;OpenAI Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://developers.soundcloud.com/blog/blog.rss"&gt;SoundCloud Backstage Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://stripe.com/blog/feed.rss"&gt;Stripe Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://rsshub.bestblogs.dev/deeplearning/the-batch"&gt;The Batch | DeepLearning.AI | AI News &amp;amp; Insights&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://blog.dropbox.com/feed"&gt;The Dropbox Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://github.blog/feed/"&gt;The GitHub Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://medium.com/feed/netflix-techblog"&gt;The Netflix Tech Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://blogs.microsoft.com/feed/"&gt;The Official Microsoft Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://vercel.com/atom"&gt;Vercel Blog&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="https://engineeringblog.yelp.com/feed.xml"&gt;Yelp Engineering and Product Blog&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;
&lt;/div&gt;
&lt;/details&gt;


&lt;h1 id="engineering--scale--2026-05-01"&gt;Engineering @ Scale — 2026-05-01&lt;a class="anchor" href="#engineering--scale--2026-05-01"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="signal-of-the-day"&gt;Signal of the Day&lt;a class="anchor" href="#signal-of-the-day"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Netflix completely decoupled its ML model routing logic from its data plane proxy, eliminating a centralized service that was causing 10-20ms of serialization latency. By shifting routing metadata generation to a specialized &amp;ldquo;Lightbulb&amp;rdquo; service that injects routing keys into headers, they allowed their existing Envoy proxy to handle massive payloads without costly deserialization, proving that strict control-plane/data-plane separation is critical for low-latency ML serving at scale.&lt;/p&gt;</description></item></channel></rss>