Week 19 Summary

Simon Willison — Week of 2026-04-18 to 2026-05-01#

Highlight of the Week#

The alpha release of llm 0.32a0 marks a foundational architectural pivot for Simon’s ecosystem of CLI tools. By moving away from a simple text-in/text-out abstraction to one that natively models complex message sequences and typed streams, the library is now future-proofed to handle the realities of modern frontier models. This opens the door for seamless integration of server-side tool calls, multi-modal inputs, and reasoning tokens.

2026-04-28

Simon Willison — 2026-04-28#

Highlight#

The most fascinating read today is the breakdown of talkie, a 13B vintage language model trained purely on pre-1931 text. It raises excellent questions about training data purity (“vegan models”) and the difficulty of preventing anachronistic contamination when fine-tuning with modern AI.

Posts#

[Introducing talkie: a 13B vintage language model from 1930] · Source Nick Levine, David Duvenaud, and Alec Radford have released an Apache 2.0-licensed 13B model trained entirely on 260 billion tokens of pre-1931, out-of-copyright text. Simon dives into the concept of “vegan models”—LLMs trained solely on licensed or public domain data—noting that while talkie’s base model qualifies, its chat-finetuned version relies on Claude Sonnet and Opus for preference optimization and synthetic chats. This creates an anachronistic contamination problem, though the team ultimately hopes to use their vintage models as judges to bootstrap an era-appropriate post-training pipeline. When tested with a classic prompt for an SVG of a pelican riding a bicycle, the 1930 model generated a highly amusing, historically framed textual description instead.