Week 17 Summary

Simon Willison — Week of 2026-04-11 to 2026-04-17#

Highlight of the Week#

This week’s most striking revelation came from Simon’s infamous “pelican riding a bicycle” SVG generation benchmark, where a 21GB quantized local model (Qwen3.6-35B-A3B) unexpectedly outperformed Anthropic’s brand-new Claude Opus 4.7 flagship. Running locally on a MacBook Pro via LM Studio, Qwen generated a better bicycle frame and even won a secret unicycle backup test, leading Simon to conclude that his joke benchmark’s long-standing correlation with general model utility has finally broken down.

Week 19 Summary

Simon Willison — Week of 2026-04-18 to 2026-05-01#

Highlight of the Week#

The alpha release of llm 0.32a0 marks a foundational architectural pivot for Simon’s ecosystem of CLI tools. By moving away from a simple text-in/text-out abstraction to one that natively models complex message sequences and typed streams, the library is now future-proofed to handle the realities of modern frontier models. This opens the door for seamless integration of server-side tool calls, multi-modal inputs, and reasoning tokens.

Week 20 Summary

Engineering Reads — Week of 2026-05-07 to 2026-05-15#

Week in Review#

This week’s engineering discourse reflects a mature industry grappling with system boundaries and human intent. From constraining unpredictable AI integrations into strictly bounded functional workflows to leveraging organizational psychology to structure open-source compiler architecture, practitioners are aggressively reclaiming control over non-determinism. We are seeing a distinct pushback against buzzword-driven hype in favor of operational stability, rigorous domain modeling, and trusting native web standards over heavyweight abstractions.

2026-04-15

Simon Willison — 2026-04-15#

Highlight#

The standout exploration today is Simon’s hands-on dive into Google’s new Gemini 3.1 Flash TTS API. It perfectly captures his rapid-prototyping ethos: encountering a surprisingly complex new prompting paradigm for an audio model and immediately using Gemini 3.1 Pro to “vibe code” a UI to stress-test regional British accents.

Posts#

Gemini 3.1 Flash TTS Google released Gemini 3.1 Flash TTS, an audio-only output model controlled via standard Gemini API prompts. Simon points out that the prompting guide is highly unusual, so he put it to the test by prompting for charismatic Newcastle and Exeter accents. To speed up his experimentation, he used Gemini 3.1 Pro to instantly vibe code a custom UI for the API.

2026-04-30

Simon Willison — 2026-04-30#

Highlight#

The most fascinating discussion today centers on the cultural clash between AI-assisted programming and traditional open-source community building, specifically looking at the Zig project’s strict ban on LLM-authored contributions. It perfectly articulates a growing divide: while AI can generate perfect code, it breaks the “contributor poker” investment model that maintainers rely on to grow trusted human collaborators over time.

Posts#

The Zig project’s rationale for their firm anti-AI contribution policy Simon dives into Zig’s stringent anti-LLM policy for issues, PRs, and bug tracker comments. He highlights Loris Cro’s concept of “contributor poker,” which argues that open-source maintainers invest in people, not just their initial code contributions. Because reviewing an LLM-assisted PR doesn’t help the project cultivate a new, confident contributor, the maintainer’s time is wasted. Interestingly, this policy means that Bun—an Anthropic-acquired JavaScript runtime built on a Zig fork—is keeping a massive 4x compile performance improvement un-upstreamed due to their heavy use of AI.

2026-05-03

Engineering Reads — 2026-05-03#

The Big Idea#

Effective error reporting often demands a shift in perspective: instead of decorating errors at the point of failure, we should accumulate context implicitly along the happy path. This telescopic, block-scoped approach minimizes developer friction, though it surfaces new challenges when expected errors (like I/O cancellation) are caught and handled upstream rather than fatally reported.

Deep Reads#

Minimal Viable Zig Error Contexts · Matklad · matklad.github.io Zig’s strongly-typed error codes solve error handling, but its idiomatic “Diagnostics sink” pattern for error reporting introduces too much friction for lightweight or script-like code. To avoid the poor debuggability of naked try statements or the sheer verbosity of custom error wrappers, Matklad proposes a “worse-is-better” pattern that logs key-value context via errdefer at the block level. This creates a telescopic context across the call stack without cluttering the happy path or requiring modifications to individual fallible operations. However, this technique has a severe tradeoff: it unconditionally logs context even if the error is later handled gracefully, which is problematic in Zig 0.16 where serendipitous IO cancellation is treated as a recoverable error. Systems engineers and language designers should read this for a practical exploration of how the ergonomics of context gathering shape the readability of our code.

2026-05-08

Engineering Reads — 2026-05-08#

The Big Idea#

Code formatters should amplify developer intent rather than blindly override it. Tools that rely on subtle syntactic cues to steer layout often yield cleaner, more readable code than rigid, algorithmically-driven alternatives.

Deep Reads#

[Steering Zig Fmt] · matklad.github.io · Source The core insight here is that zig fmt outperforms rigid alternatives like rustfmt or deno fmt because it is uniquely “steerable”. Rather than applying a strict layout heuristic, the tool relies on developer-provided cues—such as a trailing comma—to seamlessly toggle a function call between single-line and multi-line layouts. It even handles complex columnar alignments for arrays by simply mirroring the developer’s first line break, and allows varying items per line using concatenation operators like ++. The underlying philosophy acknowledges a subtle tradeoff: while total automation eliminates stylistic arguments, it destroys semantic grouping, since the best formatting relies heavily on logical blocks and intermediate variables that machines cannot infer. By leaning into human choices rather than eliminating them, the tool strikes a pragmatic balance. Anyone building developer tooling or designing language ergonomics should read this to understand why leaving room for human intent often yields a superior developer experience.