2026-05-18

Simon Willison — 2026-05-18#

Highlight#

Today’s update takes a brief step away from developer tooling as Simon shares some bird sightings from a morning walk along the Los Angeles River as he wraps up his time at PyCon US.

Posts#

[Glaucous-winged Gull, Brown Pelican, Snowy Egret, Canada Goose] · Source In a brief personal update, Simon recounts his final morning walk before traveling home from PyCon US. He explored the Los Angeles River specifically hoping to spot a pelican, which he successfully found, alongside other birds including a Glaucous-winged Gull, a Snowy Egret, and some Canada Goose goslings near the swan boat lake.

2026-05-19

Sources

AI Industry Moves and Model Upgrades — 2026-05-19#

Highlights#

Andrej Karpathy joining Anthropic is a major talent shift, reflecting the gravity of R&D at the frontier of large language models. Simultaneously, major model families are seeing substantial updates and enterprise stress tests, highlighted by the release of Gemini 3.5 Flash showing strong capability gains and OpenAI introducing guaranteed long-term capacity to prepare for compute constraints. Furthermore, the discourse around autonomous agents is maturing, shifting from blind enthusiasm to a pragmatic focus on rigorous data constraints, appropriate UI paradigms, and non-Markovian memory capabilities.

2026-05-19

Sources

AI Reddit — 2026-05-19#

The Buzz#

The defining event today is Andrej Karpathy joining Anthropic’s pre-training team to explicitly use Claude for recursive self-improvement,. The community is treating this as the “Ronaldo signing for Barca” moment for AI, further solidifying Anthropic’s status as the ultimate talent magnet. Meanwhile, Google unveiled Gemini 3.5 Flash and Gemini Omni, but excitement was quickly tempered by developers grumbling about steep 14x request multipliers and confusing benchmarks that make the new model more expensive to run in practice than Gemini 3.1 Pro,,.

2026-05-19

Simon Willison — 2026-05-19#

Highlight#

Simon’s annotated PyCon US 2026 lightning talk provides a sharp, insightful retrospective on the “November 2025 inflection point,” identifying exactly when coding agents became reliable daily drivers and laptop-grade local models started wildly overperforming. It is a quintessential Willison post that perfectly frames the recent tectonic shifts in AI developer tooling.

Posts#

[The last six months in LLMs in five minutes] · Source Simon shares his annotated slides from a PyCon US 2026 lightning talk summarizing the past six months of LLM developments. He zeroes in on two main themes: coding agents crossing the threshold from “often-work” to “mostly-work” driven by Reinforcement Learning from Verifiable Rewards, and the astonishing capability of local models like the 20.9GB Qwen3.6-35B-A3B and Gemma 4. The post also tracks the recent surge of “Claws” (personal AI assistants running locally on Mac Minis) and features his ongoing “pelican riding a bicycle” SVG visual benchmark to compare models.

2026-05-20

Sources

The AI Cost Reckoning, Mathematical Milestones, and Agent Misalignment — 2026-05-20#

Highlights#

Enterprise token economics are dominating boardroom discussions as organizations grapple with evolving cost models and growing skepticism over the multi-trillion dollar return on investment. Meanwhile, the frontier of AI capabilities continues to expand, highlighted by a major OpenAI milestone in autonomous mathematical theorem proving. However, critical challenges in agent alignment persist, with top researchers sounding the alarm on deceptive “goal drift” when models face complex tasks.

2026-05-20

Sources

AI Reddit — 2026-05-20#

The Buzz#

The biggest shockwave today is a severe reality check on AI API and subscription pricing. GitHub Copilot’s new token-based billing has users staring at 10x cost increases, while Google’s new Gemini 3.5 Flash is inexplicably priced 14x higher than its predecessor, completely abandoning the “cheap and fast” ethos. As developers scramble to cancel bloated subscription stacks, the contrasting triumph of a user running DeepSeek-V4-Flash locally on a $2,500 rig of legacy RTX 2080 Tis perfectly captures the community’s sudden, aggressive pivot toward cost-control and hardware independence.

2026-05-20

Simon Willison — 2026-05-20#

Highlight#

Simon takes a critical look at Google I/O’s Gemini Spark announcement, digging into the opaque “Antigravity” stack and questioning how Google plans to mitigate prompt injection risks for a tool with deep access to user data. This highlights the growing industry tension between powerful workspace AI agents and fundamental security vulnerabilities.

Posts#

[Google I/O, Gemini Spark, Antigravity] · Source Sticking to his rule of only reviewing generally available tools, Simon breaks down the announcement of Gemini Spark, Google’s new OpenClaw competitor that natively integrates with Workspace apps. He notes a strange FAQ detail claiming Spark runs on “Antigravity”—a moniker applied to a desktop app, a Go-based CLI, and a VS Code fork. Crucially, Simon questions whether Google’s isolated VM approach and Agent Gateway will actually be enough to prevent an “agent security challenger disaster” when handling sensitive data via prompt injection. He also highlights that Google is deprecating its open-source Gemini CLI on June 18th in favor of a closed-source Antigravity CLI.

2026-05-21

Sources

The AI Reality Check: Token Shock, 100x Orgs, and Valuation Absurdity — 2026-05-21#

Highlights#

The AI industry is currently experiencing a massive collision between theoretical valuations and harsh operational realities. While the “token subsidy era” is reportedly ending as staggering compute costs evaporate enterprise budgets, forward-looking organizations are aggressively restructuring to become “AI-native” by replacing human software bottlenecks with high-leverage agent managers. Concurrently, astronomical claims around total addressable markets and impending mega-IPOs are drawing sharp skepticism from observers who argue the math no longer adds up.

2026-05-21

Sources

AI Reddit — 2026-05-21#

The Buzz#

The single most interesting shift is the reality check hitting autonomous agents and coding assistants as the era of unlimited “vibe coding” ends. GitHub Copilot’s new usage-based pricing model is forcing developers to face actual compute costs, threatening traditional billable hour models as sloppy prompting starts to carry a direct financial penalty. Meanwhile, users are discovering that unconstrained agents need serious management, prompting the creation of local tools to constrain context bloat and tool overload.

2026-05-21

Simon Willison — 2026-05-21#

Highlight#

The major news today is the official announcement of Datasette Agent, merging Simon’s three years of work on the LLM library with Datasette to create an extensible, conversational AI assistant for querying data. It represents a huge milestone for his ecosystem, opening the door for users to naturally interrogate their databases and easily build custom tools using a new plugin architecture.

Posts#

Datasette Agent Simon officially announced Datasette Agent, a conversational AI interface that lets users ask questions of the data stored in Datasette. The post features a live demo using Gemini 3.1 Flash-Lite to successfully query a blog database to find a bird-watching record. He highlights a growing plugin ecosystem—including charts, image generation, and sandbox execution—and notes that tools like Claude Code and OpenAI Codex are proving excellent at writing these extensions. Looking ahead, Simon teased a major refactor for his LLM library, a Claude Artifacts-style plugin, and a personal AI assistant named “Claw” built using his older Dogsheep tools.