Sources

Engineering @ Scale — 2026-03-31#

Signal of the Day#

Meta has transformed the massive scaling costs of their trillion-parameter Adaptive Ranking Model from linear to sub-linear by shifting to a request-oriented computation flow, proving that co-designing models with hardware (like In-Kernel Broadcasts) can break physical memory limits without sacrificing sub-second latency.

Deep Dives#

Discord Open Sources Osprey Safety Rules Engine · Discord Processing 400 million daily actions for real-time safety requires extreme throughput without sacrificing developer accessibility. Discord built Osprey with a polyglot architecture, using a high-performance Rust coordinator to manage traffic routing. Stateless Python workers then execute the actual business logic using a custom domain-specific language called SML. This tradeoff decouples core infrastructure performance from rule deployment, allowing trust and safety teams to rapidly push mitigations. Separating traffic coordination from accessible execution environments is a highly reusable pattern for real-time rules engines.

KubeVirt v1.8 Brings Multi-Hypervisor Support and Confidential Computing to Kubernetes · KubeVirt Tightly coupling Kubernetes virtualization to KVM limits deployment flexibility and security architectures. In version 1.8, maintainers introduced a Hypervisor Abstraction Layer (HAL) to decouple KubeVirt from specific backends. While this abstraction introduces minor routing overhead, it unlocks support for alternative hypervisors and advanced confidential computing environments. This demonstrates the value of building rigorous abstraction layers to prevent platform orchestrators from being locked into single hardware execution models.

QCon London 2026: Team Topologies as the ‘Infrastructure for Agency’ with AI · QCon Scaling AI agents successfully is constrained more by organizational maturity than by the underlying technology. Applying Team Topologies frameworks allows companies to treat AI agents as entities requiring bounded agency, security, and stewardship. Instead of centralized AI teams, organizations are relying on Innovation and Practices Enabling Teams to optimize internal processes and diffuse knowledge. Treating autonomous agents as organizational actors that require strict scoping is crucial for seeing real-world ROI.

Failure As a Means to Build Resilient Software Systems: A Conversation with Lorin Hochstein · InfoQ Automated fault injection tools often fail to replicate the complex, cascading nature of real-world systemic failures. Engineers must treat actual production incidents as the primary lens for understanding how software systems truly operate under stress. The tradeoff is accepting that proactive chaos engineering, while useful for basic robustness, cannot replace the deep architectural context gained from mitigating messy, human-in-the-loop outages. Post-incident analysis should be utilized as a formal feedback loop for system design rather than just an operational chore.

Cloudflare Adds Active API Vulnerability Scanning to Its Edge · Cloudflare APIs represent a rapidly expanding attack surface that requires dynamic, real-time security validation. Cloudflare addresses this by integrating a Dynamic Application Security Testing (DAST) tool directly into its API Shield platform at the edge network. Moving active scanning to the edge reduces load on origin servers but requires highly distributed, synchronized state management across global points of presence. Shifting application security testing out of CI/CD pipelines and into edge routing layers allows for continuous, real-world validation.

Event-Driven Patterns for Cloud-Native Banking: Lessons from What Works and What Hurts · InfoQ Decoupling banking systems introduces complex distributed transaction challenges and new failure modes. To guarantee reliable state transitions, architects must implement patterns like the transactional inbox/outbox and strictly stable event contracts. The tradeoff involves accepting operational complexity and eventual consistency in exchange for highly scalable services and clear, decoupled activity trails. Implementing outbox patterns ensures atomic updates across local databases and distributed message brokers, a necessity for financial ledgers.

TanStack Start Introduces Import Protection to Enforce Server and Client Boundaries · TanStack Full-stack React applications risk accidentally leaking sensitive server code or secrets into client-side bundles. A new Vite plugin automatically checks imports during the development and build processes, utilizing explicit markers and file naming conventions. By enforcing these boundaries at compile-time, the system blocks harmful imports without requiring developer discipline or complex runtime checks. Leveraging compiler-level plugins to enforce hard security boundaries is a powerful pattern for isomorphic codebases.

Kubernetes Autoscaling Demands New Observability Focus Beyond Vendor Tooling · InfoQ Traditional infrastructure metrics fall short in environments utilizing dynamic, rapid autoscalers like Karpenter. Engineering teams are shifting their observability strategies to monitor provisioning behavior, scheduling latency, and overall cost efficiency. This approach prioritizes deep, platform-agnostic scheduling insights over vendor-specific physical node metrics. Aligning your telemetry strategy with the abstraction level of your orchestrator is necessary as infrastructure becomes entirely ephemeral.

Hidden Decisions You Don’t Know You’re Making · InfoQ Engineering culture and software architecture are frequently shaped by invisible defaults, such as CI/CD bottlenecks or misaligned metrics. Leaders must identify the “decision behind the decision” to deliberately align incentives with high-performing behaviors. This often means sacrificing short-term feature velocity to untangle platform complexity and formalize implicit constraints. Treating process defaults as formal architectural decisions prevents silent structural decay in scaling organizations.

Agentic AI Patterns Reinforce Engineering Discipline · InfoQ AI-assisted development can lead to chaotic codebases if it lacks structured patterns for high-quality delivery. To ground generative AI, engineering teams are adopting specification-driven development and remixing as core workflows. This shifts the developer’s focus from writing implementation code to authoring rigorous, testable specifications that constrain AI outputs. Using specification-first design is critical to bounding the non-deterministic nature of AI code generation.

PyPI Supply Chain Attack Compromises LiteLLM · FutureSearch A successful supply chain attack against LiteLLM, a package downloaded roughly 3 million times daily, highlights the vulnerabilities in modern AI ecosystems. Attackers distributed a compromised version on PyPI containing a malicious payload designed to harvest and exfiltrate sensitive data. This demonstrates the severe tension between rapid open-source adoption and the necessity of strict dependency pinning and artifact verification. Implementing zero-trust dependency management and egress network filtering is essential to mitigate downstream remote code execution.

Can your governance keep pace with your AI ambitions? AI risk intelligence in the agentic era · AWS Traditional IT governance frameworks designed for static deployments cannot secure agentic AI, which operates non-deterministically and chooses unique workflows. AWS built AI Risk Intelligence (AIRI), an automated engine that continuously extracts evaluation criteria from frameworks (like OWASP) and reasons over system evidence (architectures, configurations). Because agents can seamlessly mask data exfiltration within authorized permissions, AIRI uses semantic entropy to measure output consistency, triggering human review for ambiguous findings. Replacing binary security rules with continuous, reasoning-based evaluations is mandatory for multi-agent coordination.

AWS launches frontier agents for security testing and cloud operations · AWS Manual penetration testing is too slow to cover rapidly changing portfolios, and incident resolution requires correlating telemetry across sprawling multicloud stacks. AWS’s new “frontier agents” ingest source code and architecture diagrams to independently orchestrate targeted attack chains, or trace live incidents back to exact deployment changes. Giving autonomous systems persistent access to operate for hours or days without human oversight fundamentally trades traditional manual control for massively scaled, continuous remediation. Elevating AI from stateless chatbots to persistent background operators is the next evolution of site reliability engineering.

Accelerating software delivery with agentic QA automation using Amazon Nova Act · AWS Traditional UI automation frameworks rely on brittle code-level identifiers (DOM selectors), causing tests to break during harmless layout refactors. Amazon Nova Act utilizes a custom computer-use model that interacts with applications via visual understanding and natural language, entirely bypassing code inspection. This removes the maintenance burden of technical locators, allowing teams to translate product requirements directly into test definitions. Decoupling test specifications from implementation details via visually-reasoned agentic testing drastically reduces maintenance overhead.

Building an AI powered system for compliance evidence collection · AWS Compliance audits typically require hundreds of manual screenshots across multiple authenticated systems, which is both time-consuming and difficult to reproduce. Teams can automate this using a browser extension paired with Amazon Nova 2 Lite, which parses compliance documents to generate executable JSON workflows that capture organized, timestamped evidence. The workflow engine handles complex edge cases like MFA via a “wait_for_user” step, maintaining a human-in-the-loop while the agent handles navigation and state validation. Bridging non-API-accessible enterprise apps with agent-driven UI automation solves significant operational toil.

Build a FinOps agent using Amazon Bedrock AgentCore · AWS Obtaining a unified view of cloud spending requires correlating data from disparate billing and optimization consoles. AWS solved this by building a FinOps agent on AgentCore Runtime, utilizing the Model Context Protocol (MCP) to orchestrate 24 specialized tools via an LLM. To ensure secure communication between the Gateway and MCP servers, AgentCore Identity manages machine-to-machine OAuth 2.0 credential lifecycles using Amazon Cognito. Utilizing MCP allows teams to cleanly decouple LLM orchestration from the execution and credential management of sensitive backend APIs.

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations · AWS Because LLMs are non-deterministic, a single successful test pass cannot guarantee an agent will reliably select the correct tools or synthesize accurate responses in production. AWS built AgentCore Evaluations to ingest OpenTelemetry traces and apply LLM-as-a-judge and custom Lambda code evaluators across three hierarchical levels: session, trace, and individual tool-calls. This allows teams to utilize cheap, deterministic Lambda checks for exact data validation (like formatting rules) while reserving expensive LLM judges for subjective metrics like “helpfulness”. Deconstructing agent behavior into instrumented traces is required to escape the cycle of manual, reactive debugging.

Agent-driven development in Copilot Applied Science · GitHub Analyzing the massive trajectories generated by agent evaluation benchmarks presents an impossible manual workload for engineers. Using the Copilot CLI and Claude Opus, GitHub researchers automated this toil by designing an “agent-first” repository optimized for AI contributions via MCP servers. By adopting a “blame process, not agents” philosophy, they enforce strict typing, linters, and contract testing to automatically bound the agent’s blast radius. Optimizing codebases for AI consumption—through rigorous architecture and documentation—unlocks extremely rapid, collaborative agent development.

Streamlining access to powerful disaster recovery capabilities of AWS · AWS Restoring data alone is insufficient for disaster recovery; modern workloads require the exact recreation of compute, networking, configurations, and persistent attachments in a new region or account. By integrating AWS Backup and Elastic Disaster Recovery with Arpio’s SaaS platform, teams can automatically discover, translate, and restore entire workload environments. A critical architectural component is dynamically translating database endpoints and updating Route 53 CNAME records in the recovery VPC to seamlessly redirect applications. Treating infrastructure translation and dynamic DNS routing as core pillars of workload recovery eliminates the heavy lifting of custom scripting.

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads · Meta Serving trillion-parameter ads recommendation models typically introduces massive latency and memory bottlenecks that degrade user experience. Meta solved this by shifting to a Request-Oriented Optimization architecture, which computes high-density user signals once per request rather than redundantly for every ad candidate. Combined with “Wukong Turbo” hardware-aware graph specialization and multi-card embedding sharding, this eliminates memory limits and boosts Model FLOPs Utilization (MFU) to 35%. Aggressively co-designing models with hardware—including selective FP8 quantization—is mandatory to bend the physical limits of inference latency at scale.

Web Excursions for March 31st, 2026 · Brett Terpstra Maintaining fast, accurate voice-to-text workflows typically relies on high-latency cloud APIs with privacy tradeoffs. The highlighted “Steno” application solves this by utilizing Apple Intelligence to provide native, sub-second dictation directly on macOS. Trading massive cloud model capacity for strict local-only privacy and zero-latency execution via native Swift frameworks creates a superior user experience. Pushing inference directly to edge devices is highly effective when privacy and latency strictly outweigh raw model size.

Open to Work: How to Get Ahead in the Age of AI · Microsoft The accelerating integration of AI is destabilizing traditional, predictable career ladders and workflows. Enterprise software engineering is responding by designing AI tools like Copilot to act as canvases for human collaboration rather than strict replacements for human roles. This approach requires focusing tool design on augmenting individual skills and ensuring a human-in-the-loop dynamic. Designing enterprise products to amplify uniquely human context prevents the brittleness associated with purely autonomous workflows.

How Meta Turned Debugging Into a Product · Meta Incident investigation heavily relies on tribal knowledge and runbooks that quickly go stale in complex microservice architectures. Meta engineered “DrP,” a platform where engineers write programmable investigation workflows (“analyzers”) as code, which execute automatically and chain across service boundaries when an alert fires. By forcing debugging logic through CI/CD, backtesting, and code review, Meta transformed operational toil into highly maintained, testable software artifacts. Codifying diagnostic workflows into composable programs drastically reduces resolution times compared to isolated, undocumented shell scripts.

Accelerating the next phase of AI · OpenAI Scaling frontier AI models globally requires overcoming massive physical infrastructure and compute bottlenecks. OpenAI raised $122 billion in new funding specifically to invest in next-generation compute and meet enterprise demand for models like ChatGPT and Codex. This highlights a strategic tradeoff: prioritizing massive capital expenditure on physical data centers over purely algorithmic optimization on existing hardware. At the absolute edge of machine learning, hardware scale remains the primary gating factor to capability expansion.

Axios package compromise and remediation steps · Vercel A severe supply chain attack compromised the widely-used axios npm package, injecting a malicious payload into versions 1.14.1 and 0.30.4. Vercel protected its platform by immediately blocking outgoing access from their build infrastructure to the attacker’s Command & Control hostname. Enforcing strict network-level egress blocking at the platform layer successfully neutralized the threat without waiting for individual tenants to rotate keys or downgrade dependencies. Zero-trust egress filtering in CI/CD and build environments is critical to surviving remote code execution attempts via compromised packages.

How FLORA shipped a creative agent on Vercel’s AI stack · FLORA Orchestrating visual AI agents requires highly parallel, long-running processes (like asynchronous image generation) that standard serverless functions cannot handle reliably. FLORA migrated from a fragmented setup to Vercel’s integrated AI Stack, combining the AI SDK for primitives with the Workflow SDK for durable orchestration. This architectural shift provided fluid compute capable of persisting state, managing retries, and fanning out jobs without losing progress during minutes-long executions. Using durable execution frameworks is essential to manage state for non-deterministic, long-running agent workflows.

Build with Veo 3.1 Lite, our most cost-effective video generation model · Google Deploying high-fidelity video generation models at scale introduces significant computational expense and latency constraints. Google released Veo 3.1 Lite in preview through the Gemini API and AI Studio to provide a more cost-effective inference tier. This trades absolute video fidelity for significantly improved API economics, allowing developers to scale generation pipelines efficiently. Offering multi-tiered model variants (Lite vs. Pro) is a crucial pattern for giving API consumers granular control over the cost-performance continuum.

Efficiency at Scale: NVIDIA, Energy Leaders Accelerating Power‑Flexible AI Factories to Fortify the Grid · NVIDIA Massive AI data centers place severe strain on static power grids, threatening reliability and inflating infrastructure costs. Utilizing the Vera Rubin DSX architecture and Emerald AI’s platform, organizations can treat AI factories as flexible grid assets that dynamically modulate compute loads. By dynamically throttling AI token generation based on real-time grid conditions, operators trade maximum sustained throughput for overall grid stability and faster interconnection approvals. Architecting compute clusters to act as variable power loads is necessary to scale physical infrastructure in energy-constrained environments.

When AI Breaks the Systems Meant to Hear Us · O’Reilly Public feedback systems designed around human-scale friction face catastrophic “process shock” when overwhelmed by zero-cost, AI-generated submissions. To combat both synthetic amplification and bad-actor fabrication, institutions are deploying AI-driven topic modeling (like the UK’s Consult tool) alongside strict identity verification laws. This forces governments to abandon the assumption that submission volume equates to genuine human effort, requiring machines to parse what machines generate. When the cost of participation drops to zero, systems must fundamentally shift from measuring input volume to verifying identity and extracting semantic consensus.

“Conviction Collapse” and the End of Software as We Know It · O’Reilly The sheer speed of AI code generation is causing “conviction collapse” among founders, who can now build entirely new products instantly rather than defending a single, thoroughly-researched vision. As a result, software is shifting from being a static “product” to a dynamic process, consisting of disposable applications and extracted situational “skills” (like specialized AI code review personas). Instead of building massive, universal SaaS platforms, engineers are treating code as an instantly malleable medium. The competitive moat in software is moving away from application scaffolding and toward highly personalized, context-aware prompt skills.

Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers · Cloudflare Standard DDoS protections struggle to mitigate attacks on proprietary, connectionless UDP protocols (like custom game engines) because they lack deep protocol context. Cloudflare’s Programmable Flow Protection allows customers to write and deploy custom eBPF programs across the global edge network, defining logic to statefully inspect, challenge, and drop specific packet payloads. Running untrusted customer eBPF code in edge userspace VMs achieves extreme customizability while securely isolating the underlying Linux kernel. Pushing programmable logic via eBPF to the absolute network edge is the frontier of deeply integrated, application-specific network security.

Patterns Across Companies#

The shift from deterministic code to stochastic agent execution is forcing a massive rethink in software architecture; teams at AWS, GitHub, and Meta are managing this by strictly bounding AI with robust contracts, API specifications, and OpenTelemetry trace-based evaluations. Simultaneously, security logic is migrating to the absolute edge, with Cloudflare’s userspace eBPF for DDoS, Meta’s edge preprocessing for Ads, and Vercel’s platform-level egress blocking demonstrating a clear trend of neutralizing malicious intent before it ever reaches the origin. Finally, the definition of “Software” itself is mutating: as instant code generation enables disposable applications, the competitive moat is moving rapidly toward continuous system verification, identity attestation, and hyper-personalized execution skills.