Engineering Reads — 2026-07-09#

The Big Idea#

Predicting complex system outcomes—whether estimating the long-term equilibrium of AI compute markets or debugging the interplay of LLM agents in a terminal—rarely succeeds from a purely bottom-up, theoretical approach. Instead, engineers and strategists must rely on robust instrumentation, structured runtime observation, and top-down heuristics to understand evolving behaviors before they settle into a definitive state.

Deep Reads#

Ways to think about token pricing · Benedict Evans Evans argues that the current AI supply crunch obscures the long-term economic fate of foundation models, questioning whether they will achieve sustainable pricing power or devolve into low-margin commodity infrastructure. He dismisses bottom-up modeling—like estimating chip counts and datacenter capex—as a fool’s errand, akin to forecasting the 1998 broadband market. Instead, he proposes focusing on top-down structural questions regarding the durability of the frontier, market competition, and the necessity of software “wrappers” to capture value. The core insight is that unless a massive disruption occurs—such as state regulation or unforeseen network effects—current dynamics suggest models will become commoditized layers where value is captured further up the stack. This is an essential read for anyone trying to model the unit economics of AI features or allocate infrastructure spend over the next five years.

2026-07-10

Blogs, AI, Tech

AI, Chatgpt, Augmented Reality, Privacy

Simon Willison — 2026-07-10#

Highlight#

Today’s standout piece highlights a sharp critique from Nilay Patel on the unavoidable privacy tradeoffs inherent to augmented reality hardware. It serves as a necessary reality check on the physical limitations of face-worn AI devices and the societal cost of continuous cloud-based processing.

Posts#

Quoting Nilay Patel · Source Simon highlights a stark reality check from Nilay Patel regarding the physical limits and privacy implications of augmented reality glasses. Patel argues that because chips small enough to fit in glasses cannot handle real-time continuous video processing, the data must be sent to the cloud. This unavoidable architecture means that building the next major AR product requires invading user privacy, raising the critical ethical question of whether the societal tradeoffs are too high to justify building these devices at all.

2026-07-08

Blogs

Observability, Ai Agents, Large Language Models, System Architecture

Engineering Reads — 2026-07-08#

The Big Idea#

The defining characteristic of a system’s power is often not its surface interface or compute engine, but the structure of its underlying state and context. Whether transitioning from siloed observability pillars to unified columnar databases, or recognizing that an AI agent’s true identity lives in its stateful context rather than its neural network weights, engineering leverage fundamentally comes from how we store and connect data.

2026-07-09

Blogs, AI, Tech

Llms, Openai, Meta, Llm-Tool-Use, Generative-Ai

Simon Willison — 2026-07-09#

Highlight#

The standout update today is Simon’s deep dive into the newly released GPT-5.6 family, where he unpacks OpenAI’s new API features like programmatic tool calling and analyzes their latest benchmark rivalry with Anthropic. It is a highly substantive read for developers trying to track the rapidly evolving landscape of agentic workflows and advanced API-level orchestration.

Posts#

The new GPT-5.6 family: Luna, Terra, Sol · Source OpenAI launched its GPT-5.6 flagship models in three sizes (Luna, Terra, Sol) alongside claims of superior long-running agentic performance compared to Claude Fable 5. Simon highlights the fascinating benchmark drama, noting that while Fable 5 beat GPT-5.6 Sol on SWE-Bench Pro, OpenAI recently published an article claiming that ~30% of that specific benchmark is broken. For developers, the most valuable part of the post is Simon’s exploration of new API capabilities, including a built-in multi-agent pattern, explicit prompt cache breakpoints, and “Programmatic Tool Calling” that lets models write JavaScript to orchestrate sub-tools. He also generated 18 different pelican images across the models and reasoning levels to test exact token costs.

2026-07-06

Blogs

Software Architecture, Agentic-Engineering, Ai Ethics, Llm Costs

Engineering Reads — 2026-07-06#

The Big Idea#

The software industry’s adoption of agentic AI has decisively moved from aspirational proofs-of-concept to production reality, bringing with it a brutal reckoning with operational costs and a reaffirmation that fundamental architectural design matters more than ever. We are discovering that LLMs do not excuse bad code; rather, clean architecture is now an economic imperative measured directly in token efficiency.

Deep Reads#

Fragments: July 6 · Martin Fowler Martin Fowler’s latest dispatch from the Future of Software Development Retreat highlights a sharp pivot in the agentic engineering landscape: developers are no longer debating whether AI can write software, but are actively shipping agent-assisted code to production. However, this rapid operationalization has triggered what is being called the “Tokenpocalypse,” with enterprises seeing LLM API bills triple in less than a year, prompting extreme mitigation tactics like throttling usage or forcing models to output “caveman” syntax to minimize token footprints. A core technical debate has emerged regarding system design: while some hope LLMs possess a “Galaxy Brain” capable of navigating spaghetti code, the prevailing consensus argues that developer experience and agent experience share the exact same underlying needs. Good modularity and clear naming conventions help agents just as much as humans, to the point where an architecture’s quality can now be quantifiably measured by how few tokens it requires to safely implement a change. Furthermore, maintaining clean, decoupled design acts as a crucial hedge against the growing risks of AI vendor lock-in, skyrocketing costs, and potential regulatory restrictions. Practitioners evaluating or scaling agentic workflows should read this to understand why building conceptual models and cultivating “mechanical sympathy” for LLMs are replacing raw prompting as the defining skills of this new era.

2026-07-08

Blogs, AI, Tech

Ai-Assisted Programming, Sqlite-Utils, Agentic-Engineering, Llms

Simon Willison — 2026-07-08#

Highlight#

Jarred Sumner’s post on rewriting Bun from Zig to Rust using AI agents is an incredible showcase of how frontier LLMs are upending the old Spolsky rule of “never rewrite from scratch”. It is a masterclass in agentic engineering, utilizing dynamic workflows and a TypeScript conformance suite to successfully port millions of lines of code.

Posts#

Rewriting Bun in Rust · Source Jarred Sumner details the agentic engineering process of rewriting Bun from Zig to Rust to solve complex memory management issues. Using a language-independent TypeScript test suite as a conformance suite, an agent harness powered by Claude Mythos/Fable automated the massive code translation. The sheer scale of the project required 5.9 billion input tokens—around $165,000 at API pricing—proving that coordinated parallel agents fundamentally change the calculus of ground-up software rewrites.

2026-07-04

Blogs

Observability, Software Engineering, Artificial Intelligence, Engineering Management, Developer Tools

Engineering Reads — 2026-07-04#

The Big Idea#

As AI drives the marginal cost of writing code to zero, the core bottleneck of software engineering is shifting entirely from generation to validation. Organizations that fail to build rigorous, unified observability and fast feedback loops will find their systems rapidly collapsing under the entropy of machine-generated code.

Deep Reads#

New, faster NA · Brett Terpstra Brett Terpstra details the rewrite of na, a command-line todo manager for TaskPaper files, from Ruby to Rust. The core motivation was eliminating the interpreter boot latency that made Ruby poorly suited for prompt hooks executing on every directory change. The Rust port achieves behavioral parity with the original gem while providing near-instantaneous execution, proving that sometimes rewriting for performance is functionally transformative. It’s a compelling case study for CLI developers on how language startup costs directly impact user experience in shell environments. Engineers building developer tools should read this to understand when to graduate from scripting languages to compiled binaries.

2026-07-06

Blogs, AI, Tech

Sqlite, Generative-Ai, Large Language Models, Developer Tools

Simon Willison — 2026-07-06#

Highlight#

The latest release candidate for sqlite-utils is notable not just for its subtle breaking changes like compound foreign key support, but because Simon highlights his use of cutting-edge AI assistants like Claude Fable 5 and GPT-5.5 to aggressively churn through his issue backlog.

Posts#

sqlite-utils 4.0rc3 · Source Simon pushed out a third release candidate for sqlite-utils 4.0, delaying the stable release after using Claude Fable 5 and GPT-5.5 to clear out a large backlog of pull requests and issues. The most critical update is new support for introspecting and creating compound foreign keys, which requires a subtle breaking change to the table.foreign_keys Python API. Additionally, the tool now properly follows SQLite’s convention for case-insensitive column names, an update that affected numerous parts of the codebase.

2026-04-03

Blogs

Ai Agents, Version Control, Recommendation Systems, Vector Search, Orchestration

Engineering Reads — 2026-04-03#

The Big Idea#

Relying purely on probabilistic systems—whether that means the unconstrained memory of LLM agents or pure vector search for recommendations—inevitably breaks down in production. Real-world systems require hard data constraints, from backing agent state with SQL-queryable Git ledgers to tempering semantic similarity with exact algorithmic keyword matching.

Deep Reads#

[Gas Town: from Clown Show to v1.0] · Steve Yegge · Medium LLM agents suffer from progressive dementia and a lack of working memory, fundamentally limiting their long-horizon planning capabilities. Yegge argues that the solution is a persistent, queryable data plane called “Beads,” which serves as an unopinionated memory system and universal ledger for agent work. By migrating from a fragile SQLite and JSONL architecture to Dolt—a SQL database with Git-like versioning—the system eliminates race conditions and merge conflicts, providing a complete historical log of every agent action. This shifts the orchestration paradigm from reading scrolling walls of raw text output by monolithic agents to interacting with a high-level supervisor interface that manages state deterministically. Engineers building multi-agent workflows should read this to understand why robust state management, deterministic save-games, and audit trails are more critical than raw agent reasoning.

2026-04-03

Blogs, AI, Tech

Security, Generative-Ai, Ai-Security-Research, Open-Source, Social-Engineering

Simon Willison — 2026-04-03#

Highlight#

The overarching theme today is the sudden, step-function improvement in AI-driven vulnerability research. Major open-source maintainers are simultaneously reporting that the era of “AI slop” security reports has ended, replaced by an overwhelming tsunami of highly accurate, AI-generated bug discoveries that are drastically changing the economics of exploit development.

Posts#

Vulnerability Research Is Cooked · Source Highlighting Thomas Ptacek’s commentary, Simon notes that frontier models are uniquely suited for exploit development due to their baked-in knowledge of bug classes, massive context of source code, and pattern-matching capabilities. Since LLMs never get bored constraint-solving for exploitability, agents simply pointing at source trees and searching for zero-days are set to drastically alter the security landscape. Simon is tracking this trend closely enough that he just created a dedicated ai-security-research tag to follow it.