Week 17 Summary

Simon Willison — Week of 2026-04-11 to 2026-04-17#

Highlight of the Week#

This week’s most striking revelation came from Simon’s infamous “pelican riding a bicycle” SVG generation benchmark, where a 21GB quantized local model (Qwen3.6-35B-A3B) unexpectedly outperformed Anthropic’s brand-new Claude Opus 4.7 flagship. Running locally on a MacBook Pro via LM Studio, Qwen generated a better bicycle frame and even won a secret unicycle backup test, leading Simon to conclude that his joke benchmark’s long-standing correlation with general model utility has finally broken down.

Week 21 Summary

Simon Willison — Week of 2026-05-16 to 2026-05-22#

Highlight of the Week#

The most impactful milestone this week is the official announcement of Datasette Agent, merging Simon’s three years of work on his LLM library directly into Datasette. This conversational AI interface allows users to naturally interrogate their databases, boasting an extensible plugin architecture for charts, image generation, and secure code execution.

Key Posts#

[The last six months in LLMs in five minutes] · Source Simon shared annotated slides from his PyCon US 2026 lightning talk capturing a major inflection point in AI developer tooling. He highlights how coding agents crossed the threshold to become reliable daily drivers, and points to the astonishing capabilities of massive local models running on consumer hardware like Mac Minis.

2026-04-15

Simon Willison — 2026-04-15#

Highlight#

The standout exploration today is Simon’s hands-on dive into Google’s new Gemini 3.1 Flash TTS API. It perfectly captures his rapid-prototyping ethos: encountering a surprisingly complex new prompting paradigm for an audio model and immediately using Gemini 3.1 Pro to “vibe code” a UI to stress-test regional British accents.

Posts#

Gemini 3.1 Flash TTS Google released Gemini 3.1 Flash TTS, an audio-only output model controlled via standard Gemini API prompts. Simon points out that the prompting guide is highly unusual, so he put it to the test by prompting for charismatic Newcastle and Exeter accents. To speed up his experimentation, he used Gemini 3.1 Pro to instantly vibe code a custom UI for the API.

2026-05-19

Simon Willison — 2026-05-19#

Highlight#

Simon’s annotated PyCon US 2026 lightning talk provides a sharp, insightful retrospective on the “November 2025 inflection point,” identifying exactly when coding agents became reliable daily drivers and laptop-grade local models started wildly overperforming. It is a quintessential Willison post that perfectly frames the recent tectonic shifts in AI developer tooling.

Posts#

[The last six months in LLMs in five minutes] · Source Simon shares his annotated slides from a PyCon US 2026 lightning talk summarizing the past six months of LLM developments. He zeroes in on two main themes: coding agents crossing the threshold from “often-work” to “mostly-work” driven by Reinforcement Learning from Verifiable Rewards, and the astonishing capability of local models like the 20.9GB Qwen3.6-35B-A3B and Gemma 4. The post also tracks the recent surge of “Claws” (personal AI assistants running locally on Mac Minis) and features his ongoing “pelican riding a bicycle” SVG visual benchmark to compare models.

2026-05-20

Simon Willison — 2026-05-20#

Highlight#

Simon takes a critical look at Google I/O’s Gemini Spark announcement, digging into the opaque “Antigravity” stack and questioning how Google plans to mitigate prompt injection risks for a tool with deep access to user data. This highlights the growing industry tension between powerful workspace AI agents and fundamental security vulnerabilities.

Posts#

[Google I/O, Gemini Spark, Antigravity] · Source Sticking to his rule of only reviewing generally available tools, Simon breaks down the announcement of Gemini Spark, Google’s new OpenClaw competitor that natively integrates with Workspace apps. He notes a strange FAQ detail claiming Spark runs on “Antigravity”—a moniker applied to a desktop app, a Go-based CLI, and a VS Code fork. Crucially, Simon questions whether Google’s isolated VM approach and Agent Gateway will actually be enough to prevent an “agent security challenger disaster” when handling sensitive data via prompt injection. He also highlights that Google is deprecating its open-source Gemini CLI on June 18th in favor of a closed-source Antigravity CLI.