Sources

Engineering @ Scale — 2026-06-12#

Signal of the Day#

More delegation in multi-agent systems is not always better; it can easily become a liability that degrades performance. GitHub discovered that keeping simple tasks inside the main agent, rather than spinning up specialist subagents, eliminated unnecessary coordination overhead and reduced overall tool failures by 23%.

Deep Dives#

Scaling Security Insights: how we achieved a 10x increase in global scanning capacity · Cloudflare Cloudflare needed to increase its security scanning throughput from 10 to 100 scans per second but could not easily add Apache Kafka partitions due to shared broker resource constraints. To scale up, they implemented batched message processing and eliminated head-of-line blocking by splitting Kafka consumer groups into distinct “fast” and “slow” lanes. They resolved connection pool exhaustion by moving their internal API from an active-active architecture spanning Portland and Amsterdam to an active-passive setup, entirely removing a 50ms cross-region latency penalty. Furthermore, they optimized database inserts with a hybrid approach, using UNNEST for small batches and COPY for massive ones to avoid system table bloat. This multi-layered approach scaled throughput by over 10x without simply throwing hardware at the problem.

How we made GitHub Copilot CLI more selective about delegation · GitHub GitHub observed that delegating simple coding tasks to specialist subagents within the Copilot CLI introduced unnecessary coordination overhead, wait times, and tool failures. By analyzing agent trajectories, they adjusted their orchestration policy so the main agent executes focused tasks—like finding, reading, and editing a single file—directly rather than spinning up a helper. Subagents are now strictly reserved for tasks requiring broad exploration or independent, parallel execution. This architectural refinement reduced tool failures by 23% and lowered total user wait time by up to 5% at the P95 level. The core lesson is that in multi-agent systems, treating delegation as a parallelism tool rather than a default pause button improves reliability.

Slack Eliminates SSH in EMR Pipelines, Migrates 700+ Jobs to Rest-Based Architecture · Slack Slack needed to eliminate direct SSH execution across its production data clusters to improve security and observability. The engineering team executed a massive migration, moving more than 700 Airflow operators off Amazon EMR SSH pipelines. They replaced this fragile access method with Quarry, a custom REST-driven orchestration layer. This architectural shift to a server-side job lifecycle model significantly hardens their data infrastructure while removing direct operational access to production nodes.

Presentation: Moving Mountains: Migrating Legacy Code in Weeks instead of Years · ServiceTitan Refactoring large-scale legacy codebases often spans years and requires massive developer coordination. ServiceTitan rethought this entirely by implementing an AI “assembly line” pattern for architectural migrations. They decomposed the legacy refactoring into highly standardized micro-tasks, allowing an army of AI agents to execute changes in parallel. To solve the severe risk of LLM hallucinations corrupting the codebase, they wrapped the entire pipeline in programmatically rigid validation loops, ensuring speed did not compromise system integrity.

Building Supercharger: How Rocket Close optimized title operations with agentic AI · Rocket Close Rocket Close encountered severe operational bottlenecks handling complex, state-specific title examinations and manual data research across fragmented systems. They designed an agentic architecture using Strands Agents and the Model Context Protocol (MCP), isolating the logic for interacting with internal databases into distinct, maintainable tool integrations. A major architectural takeaway was shifting away from multi-step conversational data querying; instead, MCP tools pull all necessary order data in a single call before passing it to the LLM for synthesis, achieving a 3x latency improvement. They also found that offloading security enforcement to session attributes instead of baking it into business logic created much cleaner, more robust access controls across the agent.

Oracle’s OpenJDK Bans Generative AI Contributions While Oracle’s GraalVM Allows Them · Oracle Governing the intellectual property of generative AI within massive open-source ecosystems creates intense legal and technical friction. The Oracle-backed OpenJDK Governing Board instituted an interim policy completely banning AI-generated contributions to protect project integrity. In stark contrast, Oracle’s GraalVM project embraced them, explicitly permitting AI-assisted code. Both projects operate under the identical Oracle Contributor Agreement, underscoring how engineering leadership must navigate highly subjective IP risk tolerances even within a unified corporate umbrella.

Podcast: Craig McLuckie on Culture as a Team’s Operating System in the AI Era · Stacklok The rapid proliferation of AI coding assistants is fundamentally destabilizing traditional engineering team dynamics and career progression. Craig McLuckie argues that as boilerplate generation becomes commoditized, organizational culture must be deliberately engineered to serve as a team’s operating system. Technical leaders must shift their focus away from raw code output toward designing resilient, communicative structures. Organizations that fail to actively design these new career paths will struggle to retain talent as the rote mechanics of software development disappear.

Run Untrusted AI Agent Code Safely with Azure Container Apps Sandboxes · Microsoft Running untrusted code autonomously generated by AI agents introduces severe security and execution risks to enterprise infrastructure. Microsoft mitigated this threat vector by launching Azure Container Apps Sandboxes, a novel ARM resource type. These hardware-isolated environments allow engineering teams to safely execute untrusted logic while scaling to thousands of concurrent instances. Because each sandbox boots from an OCI disk image in under a second and incurs no idle costs, it offers a highly elastic, secure execution tier for agentic systems.

Pinecone Brings AI Agents Directly to Enterprise Data with Microsoft OneLake Integration · Pinecone Connecting autonomous AI agents to vast, raw corporate data typically forces data engineering teams to build brittle, high-latency synchronization pipelines. Pinecone tackled this architectural bottleneck by deeply integrating its Nexus knowledge engine directly with Microsoft OneLake. By querying the data exactly where it resides, this approach eliminates massive data duplication workflows. It fundamentally alters how enterprise AI systems reason over internal metrics, drastically reducing the time-to-insight for agentic applications.

Angular’s Official Agent Skills Helps AI Coding Tools Write Modern Angular · Google AI coding assistants frequently generate hallucinated or deprecated syntax because their training sets lag behind modern framework standards. To forcefully align these models, the Angular team published a repository of curated “Agent Skills”. This repository provides highly structured scaffolding applications and code generation rules designed specifically for agent consumption. By explicitly injecting these updated conventions into the AI’s context window, engineers can strictly constrain the generated code to current architectural best practices.

Google Launches Colab CLI for Developers, Automation, and AI Agents · Google Connecting local terminal workflows to remote machine learning instances frequently disrupts developer momentum. Google addressed this environment fragmentation by launching the Colab CLI, bridging local tools with cloud runtimes. This allows developers to programmatically execute commands on remote ML hardware without leaving their IDE. Crucially, this interface is equally accessible to autonomous AI agents, enabling them to directly orchestrate heavy ML tasks across the cloud.

Built from the inside out: How AWS Professional Services became a frontier team first · AWS Traditional consulting workflows suffer from massive non-coding overhead and slow feedback loops. AWS Professional Services re-architected their delivery model by creating a multi-agent system, the “Delivery Agent”, which utilizes structured specs instead of prose requirements. By treating AI as a foundational parallel worker rather than a simple assistant, they dramatically shifted testing left and compressed months-long engagements into days. This approach proves that true productivity gains come from redesigning the underlying engineering workflow around agent capabilities, not just layering tools on top of legacy processes.

From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services · AWS Processing complex, multimodal documents at an enterprise scale traditionally bottlenecks on manual OCR pipelines lacking semantic context. AWS architected a fully serverless, event-driven pipeline using Bedrock Data Automation and Step Functions to intelligently handle document splitting, extraction, and validation for files up to 3,000 pages. To maintain stability at a scale of 50,000 concurrent PDFs, the system leverages asynchronous processing with task tokens and dynamic blueprints to standardize outputs. This design successfully offloads the orchestration complexity into managed services, freeing specialized task agents to perform cross-reference validations against semantic knowledge bases.

Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers · AWS & Cisco Aggregating operational context across disparate communication tools creates significant workflow friction for enterprise teams. AWS and Cisco solved this by combining Amazon Quick chat agents with remote Model Context Protocol (MCP) servers, which expose Webex meetings, Vidcast, and messaging actions directly to the AI. By standardizing integrations through MCP, the agent dynamically selects the correct read or write tool based on the user’s prompt without requiring hardcoded orchestration. A key architectural principle here is applying least privilege at the tool level—starting strictly with read-only operations and relying on OAuth 2.0 to ensure the agent only accesses what the authenticated user is permitted to see.

Ire identifies another LOTUSLITE specimen · Microsoft Traditional signature-based malware detection fails against novel variants that share tactics but lack known indicators of compromise. Microsoft deployed Project Ire, an autonomous, LLM-driven agent, to statically reverse-engineer a suspicious DLL via decompilers without any human prompts. Ire successfully analyzed the install routine, C2 packet layout, and persistence mechanisms to issue a “malicious” verdict, identifying it as a LOTUSLITE variant. The crucial engineering lesson was the agent’s ability to resist “hallucinating” a kernel-driver installation just because of a suggestive function name, proving that rigorous evidence chains prevent false positives in agentic security analysis.

How Preply combines AI and human tutors to personalize learning · Preply Scaling personalized language learning traditionally bottlenecks on the availability of human tutors to review materials. Preply engineered a solution by integrating OpenAI models into their core instructional loop. This enables the automatic generation of lesson summaries and highly personalized language exercises for individual students. By combining human tutors with agentic AI generation, they’ve architected a system that maintains high educational quality while operating at a massive global scale.

New OpenAI Academy courses for the next era of work · OpenAI As generative AI moves from chatbots to complex, agent-driven engineering workflows, the barrier to creating reliable systems remains high. OpenAI identified that developers lacked structured frameworks for building repeatable, production-ready AI pipelines. To close this gap, they launched three targeted Academy courses focused entirely on applying agents to everyday workflows. This marks a strategic push to standardize the architectural patterns engineers use when integrating autonomous agents into enterprise systems.

Introducing Vercel Drop · Vercel Deploying single-page applications or static assets often requires disproportionate CI/CD overhead and Git repository configuration. Vercel built a frictionless deployment primitive, Vercel Drop, allowing engineers to drag-and-drop folders directly into the browser. The system automatically detects the underlying framework layer, initiates the required build steps, and publishes instantly. This provides a zero-configuration deployment path for LLM-generated code or quick prototypes, with the option to attach Git for continuous deployment later.

Chat SDK adds AgentPhone support · Vercel Supporting both voice and text interactions typically forces teams to build entirely separate backend data pipelines and state managers. Vercel solved this fragmentation by releasing the AgentPhone adapter within their Chat SDK. The adapter routes all voice calls and text messages to a unified webhook, seamlessly transcribing calls into standard text messages upon completion. By consolidating mixed modalities into a single thread, AI bots maintain uninterrupted context across phone and SMS without requiring complex routing logic.

Chat SDK adds Velt support · Vercel Embedding collaborative AI directly into user canvases and documents typically requires bespoke DOM manipulation and state synchronization. Vercel introduced the Velt adapter to seamlessly bridge collaborative UI components with their Chat SDK. The architecture cleanly maps Velt documents to standard chat channels, and comment annotations directly to threads. This allows bots to reply contextually inline, grounding their answers in anchored text without engineers having to build custom parsing layers.

Build custom Slack runtimes · Vercel Heavy chat frameworks often clash with applications that already possess mature, custom state management and routing. Recognizing this, Vercel decoupled their Slack adapter into discrete, standalone primitive imports. Engineers can now selectively ingest only the components they need, such as webhook payload validation or Block Kit conversion. This composable architecture prevents framework bloat, keeping imports clean for systems that simply want raw API capabilities without the overarching Chat runtime.

Build Chat SDK web UIs in Vue or Svelte · Vercel Building AI streaming interfaces across different frontend frameworks usually requires maintaining separate server-side streaming logic. Vercel eliminated this duplication by extending their AI SDK UI message stream protocol to natively support Vue and Svelte alongside React. Since all adapters consume the exact same backend payload, developers can implement reactive useChat interfaces universally. This standardization drastically simplifies the server architecture for polyglot engineering organizations.

How Okara runs CMO agents for 120,000 companies on Vercel · Okara Managing 4 billion tokens daily across eight sub-agents for 120,000 users forced Okara to abandon maintaining separate provider SDKs. They consolidated their entire infrastructure behind Vercel AI Gateway, offloading retry logic, fallback routing, and zero-data retention compliance entirely to the edge network. Furthermore, when their autonomous SEO agent generates code fixes, it utilizes Vercel Sandboxes to test these changes in highly isolated environments. This demonstrates how extremely small engineering teams can achieve massive operational scale by making their infrastructure integration layers completely invisible.

Program Claude Code, Codex, Pi and other agent harnesses with AI SDK · Vercel Switching between established agent harnesses traditionally requires extensive application rewrites due to deeply conflicting runtime structures and state definitions. Vercel released HarnessAgent to normalize access to critical components like skills, sandboxes, and permission flows through a single, unified abstraction layer. Because it strictly returns AI SDK-compatible results, developers can completely swap underlying agent frameworks without altering their frontend UI or event-handling code. This structural decoupling isolates the fast-moving iteration of agentic harnesses from the stable application logic, significantly reducing technical debt.

Kimi K2.7 Code now available on AI Gateway · Moonshot AI Executing long-horizon programming tasks and frontend development often demands models that can natively process both text and visual inputs simultaneously. Moonshot AI engineered Kimi K2.7 Code specifically for these complex developer workflows, integrating a native multimodal architecture that operates continuously in “thinking mode”. By making this model available on Vercel AI Gateway, developers gain immediate access to its capabilities. This allows teams to safely route mixed-modality inference requests while automatically benefiting from the gateway’s enterprise-grade failover and zero data retention safeguards.

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark · NVIDIA Agentic workloads, which chain dozens of LLM calls with constantly growing context windows, stress hardware entirely differently than single-shot chat inferences. Artificial Analysis launched AgentPerf to benchmark this specific profile, revealing that NVIDIA’s GB300 NVL72 runs up to 20x more agents per megawatt than the HGX H200. This immense performance jump results from extreme full-stack codesign: connecting 72 GPUs at rack-scale to efficiently distribute massive Mixture-of-Experts (MoE) models. By purposefully overlapping CUDA compute with communication to mask coordination latency, NVIDIA optimized their architecture to specifically absorb the unique penalties of agentic handoffs.

Mythos Begets Fable, Cursor’s Composer 2.5, Agents Building Agents · DeepLearning.AI The proliferation of agentic tools requires a deep focus on evaluating open-source desktop frameworks and handling proprietary model behavior. Anthropic introduced Claude Mythos 5 and a heavily guardrailed counterpart, Fable 5, the latter actively refusing or silently degrading on prompts regarding critical security or cutting-edge AI topics. Concurrently, Cursor bypassed generalist models entirely, fine-tuning their Composer 2.5 mixture-of-experts model specifically within their IDE harness using reinforcement learning that rewards brevity and clean tool usage. These developments highlight the severe engineering tension between imposing strict API-level safety guardrails and maximizing specialized developer productivity at the local environment level.

A new pkg.go.dev API for Go · Google Developers and AI systems previously relied on fragile scraping methods to extract required package metadata from the Go package registry. To solve this, the Go team released a structured, GET-only JSON API designed specifically for operational stability and efficient caching. By providing a formal machine-readable contract via OpenAPI, the Go ecosystem is actively standardizing its metadata distribution. This foundational work sets up the broader ecosystem for deterministic AI-assisted coding, ensuring large language models have precise contextual data when reasoning about Go packages.

This Week in AI: The Next-Gen Recommendation Experience · RecoMind Despite the widespread industry hype around conversational agents, true “agentic sales” requires a sophisticated recommendation engine to actively anticipate user needs. Cutting-edge recommendation architectures now treat user behavior as a sequence prediction problem, encoding all interactions into embeddings processed by massive foundation models. Since this critical interaction data isn’t publicly accessible, building these 1.5 trillion-parameter models demands massive proprietary datasets and serious compute capabilities. This dynamic severely separates top-tier platforms from mid-tier retailers who mistakenly attempt to build recommendation logic by relying solely on generic LLM APIs.

Patterns Across Companies#

A major pattern emerging across the industry is the formalization of agentic architectures to actively limit autonomous unpredictability. Companies like GitHub and Rocket Close are intentionally restricting agent autonomy—reducing unnecessary subagent delegation and pulling data in single pre-LLM queries rather than conversational loops—to dramatically lower latency and tool failure rates. Additionally, the widespread adoption of standardized interfaces like the Model Context Protocol (MCP), utilized heavily by Rocket Close and Cisco Webex, demonstrates a structural shift toward cleanly separating orchestration logic from backend integration.