Sources

Engineering @ Scale — 2026-04-28#

Signal of the Day#

Embedding durable execution directly into services via a library—and leveraging existing host databases—removes the operational burden and single points of failure inherent to centralized orchestration clusters.

Deep Dives#

Skipper: Building Airbnb’s embedded workflow engine · Airbnb · Source To orchestrate durable execution for payments and insurance claims, Airbnb needed a solution that didn’t introduce the external dependencies and single points of failure common to dedicated orchestration clusters. They built Skipper, an embedded Java/Kotlin library that stores state directly in the host service’s existing database (MySQL or UDS) rather than a separate persistence layer. To achieve this, Skipper uses an in-memory execution queue for actions and a replay mechanism that reconstructs state by evaluating checkpointed results, rather than relying on an event-sourced event log. The major architectural tradeoff is shifting complexity to the developer: workflows must be strictly deterministic, and actions must be idempotent to tolerate at-least-once execution guarantees.

An update on GitHub availability · GitHub · Source GitHub is re-architecting to support a 30X scale increase, driven heavily by an explosion in agentic development workflows that are aggressively stressing their infrastructure. At this scale, small inefficiencies compound rapidly—cache misses translate directly into database load, and retries create traffic storms that cross-contaminate disparate product experiences. To solve this, engineering is prioritizing isolation and blast-radius reduction over feature delivery: moving off MySQL, rewriting performance-sensitive Ruby paths in Go, and explicitly decoupling critical services like Git and Actions. The core lesson is that highly coupled monoliths cannot survive exponential machine-driven traffic; systems must be designed to degrade gracefully when individual subsystems queue up.

Securing the git push pipeline: Responding to a critical remote code execution vulnerability · GitHub · Source A critical remote code execution (RCE) vulnerability was discovered via bug bounty that allowed attackers to execute arbitrary commands by injecting unsanitized characters into git push options. The root cause was a failure to sanitize the boundary between user input and an internal metadata protocol; injected delimiter characters allowed attackers to spoof internal fields, override environments, and bypass sandbox restrictions. While the engineering team shipped a sanitization fix within two hours, post-incident forensics revealed that the exploit relied on a legacy code path left over from an older deployment model that had not been purged from the container image. The takeaway is that defense-in-depth requires aggressively cleaning up dead or unused code paths from production images, as they provide a fertile attack surface for boundary sanitization failures.

Migrating a text agent to a voice assistant with Amazon Nova 2 Sonic · AWS · Source Migrating an AI agent from text to voice fails when teams try to simply bolt on a speech interface, because the stateless request-response architecture violates the strict low-latency and turn-taking requirements of voice. Instead of chaining ASR, LLM, and TTS models—which adds compounding latency at every inference hop—AWS moved to a native bidirectional speech-to-speech model (Nova 2 Sonic) that handles Voice Activity Detection (VAD) and barge-ins internally. The team had to refactor sub-agents to return highly targeted, concise payloads rather than verbose JSON, and implemented asynchronous tool calling with filler audio to mask backend processing. Voice AI demands an architectural shift to persistent bidirectional streaming (WebSockets/WebRTC) and rigorous management of inference budgets.

NVIDIA Nemotron 3 Nano Omni model now available on Amazon SageMaker JumpStart · NVIDIA/AWS · Source Enterprise agentic systems often stitch together separate models for vision, speech, and language, which fragments context and amplifies latency across reasoning loops. To address this, NVIDIA released Nemotron 3 Nano Omni (30B total, 3B active parameters via a Mixture of Experts architecture) on SageMaker. This unified multimodal model processes up to 131K tokens of video, audio, image, and text in a single inference pass. By converging perception into a single foundation model, engineers can drastically simplify orchestration logic and eliminate the cross-model synchronization overhead that usually plagues complex agent pipelines.

How Slack Manages Context in Long-running Multi-agent Systems · Slack · Source As agentic systems run for extended periods, simply appending new data to raw chat logs causes the system to lose coherence and accuracy. Slack engineering solved this context degradation by abandoning log accumulation in favor of structured memory, active validation, and distilled truth. This demonstrates that continuous state distillation is required to maintain accuracy and bounds in autonomous, long-running agent loops.

Patterns Across Companies#

A distinct architectural pattern this period is the aggressive consolidation of system hops to fight latency and complexity. Airbnb chose an embedded library to bypass external orchestration network calls, while AWS and NVIDIA are collapsing multi-model AI pipelines into single, natively multimodal inference passes. Whether in traditional distributed state management or modern AI architectures, teams are realizing that network boundaries and complex orchestration layers are becoming unacceptable bottlenecks.