Week 15 Summary

Simon Willison — Week of 2026-04-04 to 2026-04-10#

Highlight of the Week#

Anthropic’s decision to delay the general release of their highly capable Claude Mythos model under “Project Glasswing” marks a significant turning point in the AI industry. The move underscores a massive shift in frontier model capabilities, as models evolve from generating text to autonomously chaining multiple minor vulnerabilities into sophisticated exploits, requiring a new level of security safeguards before release.

Week 17 Summary

Simon Willison — Week of 2026-04-11 to 2026-04-17#

Highlight of the Week#

This week’s most striking revelation came from Simon’s infamous “pelican riding a bicycle” SVG generation benchmark, where a 21GB quantized local model (Qwen3.6-35B-A3B) unexpectedly outperformed Anthropic’s brand-new Claude Opus 4.7 flagship. Running locally on a MacBook Pro via LM Studio, Qwen generated a better bicycle frame and even won a secret unicycle backup test, leading Simon to conclude that his joke benchmark’s long-standing correlation with general model utility has finally broken down.

Week 20 Summary

Simon Willison — Week of 2026-05-08 to 2026-05-15#

Highlight of the Week#

The standout development this week is Simon’s rapid adaptation to the latest frontier model capabilities, most notably releasing llm 0.32a2 to expose and visualize the new interleaved reasoning tokens of GPT-5 class models directly in the terminal. This perfectly pairs with his hands-on explorations of embedding LLM calls deeply into developer workflows, such as executing prompts via script shebangs and leveraging models to output rich HTML rather than just Markdown.

2026-05-27

Simon Willison — 2026-05-27#

Highlight#

Simon makes a compelling case that April 2026 marks a new inflection point where frontier AI labs have found true product-market fit with coding agents. By analyzing sudden enterprise pricing pivots, sales hiring sprees, and massive inference compute deals, he illustrates how the enterprise adoption of AI agents is finally turning massive usage into real revenue.

Posts#

I think Anthropic and OpenAI have found product-market fit Simon argues that the sudden shift by OpenAI and Anthropic to charge enterprise customers full API token prices for agent usage signals true product-market fit. He notes that heavy coding agent users easily burn thousands of dollars in token equivalents, prompting labs to pivot away from middlemen like Cursor or Copilot to capture this enterprise value directly. The piece features some classic Simon dogfooding—using Claude Code and Datasette Agent to analyze AI lab job listings—and highlights a SpaceX S-1 filing revealing Anthropic’s staggering $1.25 billion monthly compute spend.

2026-05-21

Simon Willison — 2026-05-21#

Highlight#

The major news today is the official announcement of Datasette Agent, merging Simon’s three years of work on the LLM library with Datasette to create an extensible, conversational AI assistant for querying data. It represents a huge milestone for his ecosystem, opening the door for users to naturally interrogate their databases and easily build custom tools using a new plugin architecture.

Posts#

Datasette Agent Simon officially announced Datasette Agent, a conversational AI interface that lets users ask questions of the data stored in Datasette. The post features a live demo using Gemini 3.1 Flash-Lite to successfully query a blog database to find a bird-watching record. He highlights a growing plugin ecosystem—including charts, image generation, and sandbox execution—and notes that tools like Claude Code and OpenAI Codex are proving excellent at writing these extensions. Looking ahead, Simon teased a major refactor for his LLM library, a Claude Artifacts-style plugin, and a personal AI assistant named “Claw” built using his older Dogsheep tools.

2026-04-05

Simon Willison — 2026-04-05#

Highlight#

Simon highlights a deep-dive post by Lalit Maganti on the realities of “agentic engineering” when building a robust SQLite parser. The piece beautifully articulates a crucial lesson for our space: while AI is incredible at plowing through tedious low-level implementation details, it struggles significantly with high-level design and architectural decisions where there isn’t an objectively right answer.

Posts#

Eight years of wanting, three months of building with AI Simon shares a standout piece of long-form writing by Lalit Maganti on the process of building syntaqlite, a parser and formatter for SQLite. Claude Code was instrumental in overcoming the initial hurdle of implementing 400+ tedious grammar rules, allowing Lalit to rapidly vibe-code a working prototype. However, the post cautions that relying on AI for architectural design led to deferred decisions and a confusing codebase, ultimately requiring a complete rewrite with more human-in-the-loop decision making. The core takeaway is that while AI excels at tasks with objectively checkable answers, it remains weak at subjective design and system architecture.

2026-04-08

Simon Willison — 2026-04-08#

Highlight#

The most substantial piece today is a deep-dive into Meta’s new Muse Spark model and its chat harness, where Simon successfully extracts the platform’s system tool definitions via direct prompting. His exploration of Meta’s built-in Python Code Interpreter and visual_grounding capabilities highlights a powerful, sandbox-driven approach to combining generative AI with programmatic image analysis and exact object localization.

Posts#

Meta’s new model is Muse Spark, and meta.ai chat has some interesting tools Meta has launched Muse Spark, a new hosted model currently accessible as a private API preview and directly via the meta.ai chat interface. By simply asking the chat harness to list its internal tools and their exact parameters, Simon documented 16 different built-in tools. Standouts include a Python Code Interpreter (container.python_execution) running Python 3.9 and SQLite 3.34.1, mechanisms for creating web artifacts, and a highly capable container.visual_grounding tool. He ran hands-on experiments generating images of a raccoon wearing trash, then used the platform’s Python sandbox and grounding tools to extract precise, nested bounding boxes and perform object counts (like counting whiskers or his classic pelicans). Although the model is closed for now, infrastructure scaling and comments from Alexandr Wang suggest future versions could be open-sourced.

2026-04-11

Simon Willison — 2026-04-11#

Highlight#

The standout update today centers on the release of SQLite 3.53.0, where Simon highlights highly anticipated native ALTER TABLE constraint improvements and showcases his classic rapid-prototyping workflow by using Claude Code on his phone to build a WebAssembly-powered playground for the database’s new Query Result Formatter.

Posts#

SQLite 3.53.0 · Source This is a substantial release following the withdrawal of SQLite 3.52.0, packed with accumulated user-facing and internal improvements. Simon specifically highlights that ALTER TABLE can now directly add and remove NOT NULL and CHECK constraints, a workflow he previously had to manage using his own sqlite-utils transform() method. The update also introduces json_array_insert() (alongside its jsonb equivalent) and brings significant upgrades to the CLI mode’s result formatting via a new Query Results Formatter library. True to form, Simon leveraged AI assistance—specifically Claude Code on his phone—to compile this new C library into WebAssembly to build a custom playground interface.

2026-05-10

Simon Willison — 2026-05-10#

Highlight#

Simon highlights a stark example of AI hallucination making its way into mainstream journalism, serving as a critical warning for anyone relying on LLMs for factual summarization.

Posts#

Quoting New York Times Editors’ Note · Source Simon shares a sobering editors’ note from the New York Times illustrating the dangers of unchecked generative AI in the newsroom. A reporter mistakenly attributed an AI-generated summary of Canadian Conservative leader Pierre Poilievre’s views as a direct, verbatim quote. The hallucinated text falsely claimed he called politicians who changed allegiances “turncoats,” underscoring exactly why LLM outputs must be rigorously verified against primary sources rather than trusted blindly.

Simon Willison

Simon Willison — 2026-05-29#

Highlight#

Today’s most significant update is the release of Datasette 1.0a31, a massive paradigm shift for the project that introduces UI support for executing write queries directly against the database.

Posts#

datasette 1.0a31 Simon has released a major alpha for Datasette, bringing a highly-requested evolution: users with the right permissions can now execute write queries and save “stored queries” (formerly “canned queries”) directly in the UI. This allows developers to set up templated insert, update, and delete operations against their databases. This release also marks the third post on the recently launched Datasette blog, highlighting his ongoing push for better project documentation.