2026-04-05

Simon Willison — 2026-04-05#

Highlight#

Simon highlights a deep-dive post by Lalit Maganti on the realities of “agentic engineering” when building a robust SQLite parser. The piece beautifully articulates a crucial lesson for our space: while AI is incredible at plowing through tedious low-level implementation details, it struggles significantly with high-level design and architectural decisions where there isn’t an objectively right answer.

Posts#

Eight years of wanting, three months of building with AI Simon shares a standout piece of long-form writing by Lalit Maganti on the process of building syntaqlite, a parser and formatter for SQLite. Claude Code was instrumental in overcoming the initial hurdle of implementing 400+ tedious grammar rules, allowing Lalit to rapidly vibe-code a working prototype. However, the post cautions that relying on AI for architectural design led to deferred decisions and a confusing codebase, ultimately requiring a complete rewrite with more human-in-the-loop decision making. The core takeaway is that while AI excels at tasks with objectively checkable answers, it remains weak at subjective design and system architecture.

2026-04-06

Simon Willison — 2026-04-06#

Highlight#

The most substantial update today is Simon’s look at the Google AI Edge Gallery, an official iOS app for running local Gemma 4 models directly on-device. It stands out as a major milestone for local AI, being the first time a local model vendor has shipped an official iPhone app with built-in tool-calling capabilities.

Posts#

Google AI Edge Gallery Simon highlights Google’s strangely-named but highly effective official iOS app for running Gemma 4 (and 3) models natively. The 2.54GB E2B model runs fast and includes features like vision, up to 30 seconds of audio transcription, and an impressive “skills” demo showcasing tool calling against eight different HTML widgets. Despite a minor app freeze bug and the unfortunate lack of permanent chat logs, Simon considers it a significant release as the first official iOS app from a local model vendor.

2026-04-07

Engineering Reads — 2026-04-07#

The Big Idea#

The defining engineering challenge of our time isn’t just writing logic—it’s managing the friction between abstraction layers. Whether you are evolving storage interfaces to reduce data friction, stripping away software abstractions to respect hardware cache lines, or using standardized protocols to finally introspect opaque build systems, effective systems design requires knowing exactly when to hide the underlying machinery and when to expose it.

2026-04-07

Simon Willison — 2026-04-07#

Highlight#

Anthropic’s decision to restrict access to their new Claude Mythos model underscores a massive, sudden shift in AI capabilities. It is a fascinating look at an industry-wide reckoning as open-source maintainers transition from dealing with “AI slop” to facing a tsunami of highly accurate, sophisticated vulnerability reports.

Posts#

[Anthropic’s Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me] · Source Anthropic has delayed the general release of Claude Mythos, a general-purpose model similar to Claude Opus 4.6, opting instead to limit access to trusted partners under “Project Glasswing” so they can patch foundational internet systems. Simon digs into the context, tracking how credible security professionals are warning about the ability of frontier LLMs to chain multiple minor vulnerabilities into sophisticated exploits. He even uses git blame to independently verify a 27-year-old OpenBSD kernel bug discovered by the model. He concludes that delaying the release until new safeguards are built, while providing $100M in credits to defenders, is a highly reasonable trade-off.

2026-04-08

Engineering Reads — 2026-04-08#

The Big Idea#

True progression in engineering and personal mastery isn’t found in adopting flashy shortcuts or chasing peak experiences, but in the unglamorous, structural integration of daily practices. Whether you are systematizing a team’s AI usage into shared artifacts or finding contemplative focus in the architecture of a clean API, the deep work happens in the quiet consistency of the everyday.

Deep Reads#

Feedback Flywheel · Rahul Garg Garg tackles the friction inherent in AI-assisted development by proposing a structured mechanism to harvest and distribute knowledge. The core mechanism involves taking the isolated learnings developers glean from individual AI sessions and feeding them back into the team’s shared artifacts. Instead of relying on isolated developer interactions, this process transforms solitary prompt engineering into a compounding collective asset. The tradeoff requires spending deliberate effort on process overhead rather than just writing code, but it elevates the organization’s baseline capabilities over time. Engineering leaders wrestling with how to systematically scale AI tooling beyond individual silos should read this to understand the mechanics of continuous improvement.

2026-04-08

Simon Willison — 2026-04-08#

Highlight#

The most substantial piece today is a deep-dive into Meta’s new Muse Spark model and its chat harness, where Simon successfully extracts the platform’s system tool definitions via direct prompting. His exploration of Meta’s built-in Python Code Interpreter and visual_grounding capabilities highlights a powerful, sandbox-driven approach to combining generative AI with programmatic image analysis and exact object localization.

Posts#

Meta’s new model is Muse Spark, and meta.ai chat has some interesting tools Meta has launched Muse Spark, a new hosted model currently accessible as a private API preview and directly via the meta.ai chat interface. By simply asking the chat harness to list its internal tools and their exact parameters, Simon documented 16 different built-in tools. Standouts include a Python Code Interpreter (container.python_execution) running Python 3.9 and SQLite 3.34.1, mechanisms for creating web artifacts, and a highly capable container.visual_grounding tool. He ran hands-on experiments generating images of a raccoon wearing trash, then used the platform’s Python sandbox and grounding tools to extract precise, nested bounding boxes and perform object counts (like counting whiskers or his classic pelicans). Although the model is closed for now, infrastructure scaling and comments from Alexandr Wang suggest future versions could be open-sourced.

2026-04-09

Engineering Reads — 2026-04-09#

The Big Idea#

AI is shifting the bottleneck of software engineering from writing syntax to exercising taste and defining specifications. Whether it’s iterating on high-level specs for autonomous agents, evaluating generated APIs, or ruthlessly discarding over-engineered platforms for boring architecture, the defining engineering skill is now human judgment, not raw keystrokes.

Deep Reads#

Fragments: April 9 · Martin Fowler Fowler’s fragment touches on several current events, but the technical meat lies in his analysis of Lalit Maganti’s attempt to build an SQLite parser using Claude. The core insight is that AI excels at generating code with objectively checkable answers, like passing test suites, but fails catastrophically at public API design because it fundamentally lacks “taste”. Maganti’s first AI-driven iteration produced complete spaghetti code; his successful second attempt relied heavily on continuous human-led refactoring and using the AI for targeted restructuring rather than blind generation. This exposes a critical tradeoff in the current AI era: coding agents can blast through long-standing architectural “todo piles,” but human engineers must remain tightly in the loop to judge whether an interface is actually pleasant to use. Engineers exploring AI-assisted development should read this to understand where to effectively deploy agents and where to stubbornly rely on their own architectural judgment.

2026-04-09

Simon Willison — 2026-04-09#

Highlight#

Today’s most substantive update is the release of asgi-gzip 0.3, which serves as a great practical reminder of the hidden risks in automated maintenance workflows. A silently failing GitHub Action caused his library to miss a crucial upstream Starlette fix for Server-Sent Events (SSE) compression, which ended up breaking a new Datasette feature in production.

Posts#

[asgi-gzip 0.3] · Source Simon released an update to asgi-gzip after a production deployment of a new Server-Sent Events (SSE) feature for Datasette ran into trouble. The root cause was datasette-gzip incorrectly compressing event/text-stream responses. The library relies on a scheduled GitHub Actions workflow to port updates from Starlette, but the action had stopped running and missed Starlette’s upstream fix for this exact issue. By running the workflow and integrating the fix, both datasette-gzip and asgi-gzip now handle SSE responses correctly.

2026-04-10

Engineering Reads — 2026-04-10#

The Big Idea#

As AI abstractions upend our relationship with code, engineering craft is bifurcating: we must simultaneously grapple with emergent, functional behaviors in massive models while deliberately preserving the mechanical, systems-level intuition that historically grounded software ethics.

Deep Reads#

watgo - a WebAssembly Toolkit for Go · Eli Bendersky This piece introduces watgo, a zero-dependency WebAssembly toolkit written in pure Go that parses, validates, encodes, and decodes WASM. The core of the system lowers WebAssembly Text (WAT) to a semantic intermediate representation called wasmir, flattening syntactic sugar to match WASM’s strict binary execution semantics. To guarantee correctness, watgo executes the official 200K-line WebAssembly specification test suite by converting .wast files to binary and running them against a Node.js harness. An earlier attempt to maintain a pure-Go execution pipeline using wazero was abandoned because the runtime lacked support for recent WASM garbage collection proposals. Engineers working on compilers, parsers, or WebAssembly infrastructure should read this for a masterclass in leveraging specification test suites to bootstrap confidence in new tooling.

2026-04-10

Simon Willison — 2026-04-10#

Highlight#

Simon points out the non-obvious reality that ChatGPT’s Advanced Voice Mode is actually running on an older, weaker model compared to their flagship developer tools. Drawing on insights from Andrej Karpathy, he highlights the widening capability gap between consumer-facing voice interfaces and B2B-focused reasoning models that benefit from verifiable reinforcement learning.

Posts#

ChatGPT voice mode is a weaker model Simon reflects on the counterintuitive fact that OpenAI’s Advanced Voice Mode runs on a GPT-4o era model with an April 2024 knowledge cutoff. Prompted by a tweet from Andrej Karpathy, he contrasts this consumer feature with top-tier coding models capable of coherently restructuring entire codebases or finding system vulnerabilities. Karpathy notes this divergence in capabilities exists because coding tasks offer explicit, verifiable reward functions ideal for reinforcement learning and hold significantly more B2B value.