Week 15 Summary

Simon Willison — Week of 2026-04-04 to 2026-04-10#

Highlight of the Week#

Anthropic’s decision to delay the general release of their highly capable Claude Mythos model under “Project Glasswing” marks a significant turning point in the AI industry. The move underscores a massive shift in frontier model capabilities, as models evolve from generating text to autonomously chaining multiple minor vulnerabilities into sophisticated exploits, requiring a new level of security safeguards before release.

Week 17 Summary

Simon Willison — Week of 2026-04-11 to 2026-04-17#

Highlight of the Week#

This week’s most striking revelation came from Simon’s infamous “pelican riding a bicycle” SVG generation benchmark, where a 21GB quantized local model (Qwen3.6-35B-A3B) unexpectedly outperformed Anthropic’s brand-new Claude Opus 4.7 flagship. Running locally on a MacBook Pro via LM Studio, Qwen generated a better bicycle frame and even won a secret unicycle backup test, leading Simon to conclude that his joke benchmark’s long-standing correlation with general model utility has finally broken down.

Week 19 Summary

Simon Willison — Week of 2026-04-18 to 2026-05-01#

Highlight of the Week#

The alpha release of llm 0.32a0 marks a foundational architectural pivot for Simon’s ecosystem of CLI tools. By moving away from a simple text-in/text-out abstraction to one that natively models complex message sequences and typed streams, the library is now future-proofed to handle the realities of modern frontier models. This opens the door for seamless integration of server-side tool calls, multi-modal inputs, and reasoning tokens.

Week 20 Summary

Simon Willison — Week of 2026-05-08 to 2026-05-15#

Highlight of the Week#

The standout development this week is Simon’s rapid adaptation to the latest frontier model capabilities, most notably releasing llm 0.32a2 to expose and visualize the new interleaved reasoning tokens of GPT-5 class models directly in the terminal. This perfectly pairs with his hands-on explorations of embedding LLM calls deeply into developer workflows, such as executing prompts via script shebangs and leveraging models to output rich HTML rather than just Markdown.

2026-04-05

Simon Willison — 2026-04-05#

Highlight#

Simon highlights a deep-dive post by Lalit Maganti on the realities of “agentic engineering” when building a robust SQLite parser. The piece beautifully articulates a crucial lesson for our space: while AI is incredible at plowing through tedious low-level implementation details, it struggles significantly with high-level design and architectural decisions where there isn’t an objectively right answer.

Posts#

Eight years of wanting, three months of building with AI Simon shares a standout piece of long-form writing by Lalit Maganti on the process of building syntaqlite, a parser and formatter for SQLite. Claude Code was instrumental in overcoming the initial hurdle of implementing 400+ tedious grammar rules, allowing Lalit to rapidly vibe-code a working prototype. However, the post cautions that relying on AI for architectural design led to deferred decisions and a confusing codebase, ultimately requiring a complete rewrite with more human-in-the-loop decision making. The core takeaway is that while AI excels at tasks with objectively checkable answers, it remains weak at subjective design and system architecture.

2026-04-13

Simon Willison — 2026-04-13#

Highlight#

Today’s standout is Simon’s hands-on research into the newly released servo crate using Claude Code. It perfectly captures his classic approach to AI-assisted exploration, demonstrating how quickly you can prototype a Rust CLI tool and evaluate WebAssembly compatibility with an LLM sidekick.

Posts#

[Exploring the new servo crate] · Source Following the initial release of the embeddable servo browser engine on crates.io, Simon tasked Claude Code for web with exploring its capabilities. The AI successfully generated a working Rust CLI tool called servo-shot for taking web screenshots. While compiling Servo itself to WebAssembly proved unfeasible due to its heavy use of threads and SpiderMonkey dependencies, Claude instead built a playground page utilizing a WebAssembly build of the html5ever and markup5ever_rcdom crates to parse HTML fragments.

2026-04-30

Simon Willison — 2026-04-30#

Highlight#

The most fascinating discussion today centers on the cultural clash between AI-assisted programming and traditional open-source community building, specifically looking at the Zig project’s strict ban on LLM-authored contributions. It perfectly articulates a growing divide: while AI can generate perfect code, it breaks the “contributor poker” investment model that maintainers rely on to grow trusted human collaborators over time.

Posts#

The Zig project’s rationale for their firm anti-AI contribution policy Simon dives into Zig’s stringent anti-LLM policy for issues, PRs, and bug tracker comments. He highlights Loris Cro’s concept of “contributor poker,” which argues that open-source maintainers invest in people, not just their initial code contributions. Because reviewing an LLM-assisted PR doesn’t help the project cultivate a new, confident contributor, the maintainer’s time is wasted. Interestingly, this policy means that Bun—an Anthropic-acquired JavaScript runtime built on a Zig fork—is keeping a massive 4x compile performance improvement un-upstreamed due to their heavy use of AI.

2026-05-13

Simon Willison — 2026-05-13#

Highlight#

Simon’s standout experiment today demonstrates a clever UX workaround for sandboxed iframes, intercepting Content Security Policy (CSP) errors and passing them to the parent window for user approval. It is a great example of his hands-on AI-assisted programming, notably built using GPT-5.5 xhigh in the Codex desktop app.

Posts#

[CSP Allow-list Experiment] · Source This technical experiment explores how to load an app within a CSP-protected sandboxed iframe while maintaining a smooth user experience. Simon implemented a custom fetch() that catches CSP errors and passes them up to the parent window. The parent window can then prompt the user to add the blocked domain to an allow-list before refreshing the page. He built the tool using GPT-5.5 xhigh via the Codex desktop app.