Sources

Engineering @ Scale — 2026-06-06#

Signal of the Day#

Cloudflare discovered that at sufficient scale, database query planning itself becomes a bottleneck; by replacing exclusive locks with shared locks and eliminating per-query part list copies in ClickHouse, they successfully unblocked their high-throughput billing pipeline.

Deep Dives#

Resolving Query Planning Contention in ClickHouse · Cloudflare When Cloudflare’s billing pipeline experienced a noticeable slowdown, engineers traced the root cause not to data execution, but to contention during the query planning stage in ClickHouse. The team identified that an exclusive lock and a per-query copy of the parts list were choking concurrent operations. To resolve this, they patched ClickHouse to replace the exclusive lock with a shared lock, drop the redundant parts list copying, and improve part filtering. The key tradeoff here is exchanging slightly more complex concurrent lock management for a massive reduction in queueing delays. This serves as a strong reminder that at high throughput, database metadata operations and query planners often become the primary system bottlenecks before raw I/O does.

Bifurcating Hardware for Training vs. Inference · Google As deep learning workloads scale, the physical requirements for models diverge sharply between training and inference. Google addressed this physical constraint in their 8th generation TPUs by splitting the architecture into two distinct units: the TPU 8t built for raw training throughput, and the TPU 8i optimized for inference latency and chip-to-chip speeds. The crucial architectural decision was preventing this hardware divergence from leaking into the developer experience. Both chips share the same Axion CPUs, liquid cooling infrastructure, and software stack, ensuring code written for one seamlessly executes on the other. This demonstrates a powerful pattern for platform teams: ruthlessly specialize hardware or backend infrastructure for specific bottlenecks, but strictly maintain a unified abstraction layer for the software.

Patterns Across Companies#

Both Cloudflare and Google are hitting operational limits where generic performance optimizations are no longer sufficient. Whether it is shifting database lock granularity to handle high-concurrency query planning or physically splitting AI accelerators to address the opposing metrics of throughput and latency, scaling effectively now requires strictly workload-tailored architectures.