Simon Willison — 2026-06-02#

Highlight#

The most substantive post today is Simon’s commentary on Microsoft’s newly announced MAI models, which stand out not just for their small parameter counts (5B and 35B) but for the surprising claim that they were trained entirely on “clean and commercially licensed data”. This could signal a major shift away from models relying on unlicensed web scrapes.

Posts#

Microsoft’s new MAI models · Source Simon dissects the surprise drop of two new text LLMs at Microsoft Build: MAI-Thinking-1 (a 35B reasoning model) and MAI-Code-1-Flash (a 5B model for Copilot/VS Code). He’s particularly impressed that a 35B model reportedly beats Sonnet 4.6 in human evaluations, given he regularly runs larger models locally. The biggest takeaway, however, is Microsoft’s emphasis on using “appropriately licensed” data—raising the exciting prospect of highly capable code models built without controversial web scraping.

Pasted File Editor · Source Inspired by Claude’s slick ability to automatically convert massive text pastes into file attachments, Simon used Codex desktop to build a prototype replicating the feature. The tool also handles direct file openings, generates image thumbnails, and supports drag-and-drop functionality. It is a classic example of using AI to rapidly build out a small, sharp quality-of-life developer tool.

California Brown Pelican · Source A quick dispatch from Fort Mason in San Francisco, where Simon is currently attending the Microsoft Build conference. He spotted California Brown Pelicans diving into the water right behind the venue.

Project Pulse#

Today’s updates highlight a strong focus on AI-assisted tool building and LLM ecosystem analysis, moving from hands-on rapid prototyping with Codex to examining the broader industry trend of small, cleanly licensed corporate models.


Categories: Blogs, AI, Tech