Llms on MacWorks

2026-04-13

Mon, 01 Jan 0001 00:00:00 +0000

Simon Willison — 2026-04-13#

Highlight#

Today’s standout is Simon’s hands-on research into the newly released servo crate using Claude Code. It perfectly captures his classic approach to AI-assisted exploration, demonstrating how quickly you can prototype a Rust CLI tool and evaluate WebAssembly compatibility with an LLM sidekick.

Posts#

[Exploring the new servo crate] · Source Following the initial release of the embeddable servo browser engine on crates.io, Simon tasked Claude Code for web with exploring its capabilities. The AI successfully generated a working Rust CLI tool called servo-shot for taking web screenshots. While compiling Servo itself to WebAssembly proved unfeasible due to its heavy use of threads and SpiderMonkey dependencies, Claude instead built a playground page utilizing a WebAssembly build of the html5ever and markup5ever_rcdom crates to parse HTML fragments.

Engineer Reads

Mon, 01 Jan 0001 00:00:00 +0000

Engineering Reads — 2026-04-14#

The Big Idea#

The defining characteristic of good software engineering isn’t output volume, but the human constraints—specifically “laziness” and “doubt”—that force us to distill complexity into crisp abstractions and exercise restraint. As AI effortlessly generates code and acts on probabilistic certainty, our primary architectural challenge is deliberately designing simplicity and deferral into these systems.

Deep Reads#

[Fragments: April 14] · Martin Fowler · Martin Fowler’s Blog Fowler synthesizes recent reflections on how AI-native development challenges our classical engineering virtues. He draws on Bryan Cantrill to argue that human “laziness”—our finite time and cognitive limits—is the forcing function for elegant abstractions, whereas LLMs inherently lack this constraint and will happily generate endless layers of garbage to solve a problem. Through a personal anecdote about simplifying a playlist generator via YAGNI rather than throwing an AI coding agent at it, he highlights the severe risk of LLM-induced over-complication. The piece then shifts to adapting our practices, touching on Jessitron’s application of Test-Driven Development to multi-agent workflows and Mark Little’s advocacy for AI architectures that value epistemological “doubt” over decisive certainty. Engineers navigating the integration of LLMs into their daily workflows should read this to re-calibrate their mental models around the enduring value of human constraints and system restraint.

Engineer Reads

Mon, 01 Jan 0001 00:00:00 +0000

Engineering Reads — Week of 2026-04-02 to 2026-04-10#

Week in Review#

This week’s reading reflects a fundamental inflection point: raw LLM intelligence is no longer the bottleneck in software development. Instead, the industry is pivoting toward the hard systems engineering required to constrain probabilistic models—whether through strict data ledgers, living specifications, or formal verification harnesses. The dominant debate centers on how we preserve architectural taste, mechanical sympathy, and system ethics as the mechanical act of writing code becomes increasingly commoditized.

2026-04-10

Mon, 01 Jan 0001 00:00:00 +0000

Simon Willison — 2026-04-10#

Highlight#

Simon points out the non-obvious reality that ChatGPT’s Advanced Voice Mode is actually running on an older, weaker model compared to their flagship developer tools. Drawing on insights from Andrej Karpathy, he highlights the widening capability gap between consumer-facing voice interfaces and B2B-focused reasoning models that benefit from verifiable reinforcement learning.

Posts#

ChatGPT voice mode is a weaker model Simon reflects on the counterintuitive fact that OpenAI’s Advanced Voice Mode runs on a GPT-4o era model with an April 2024 knowledge cutoff. Prompted by a tweet from Andrej Karpathy, he contrasts this consumer feature with top-tier coding models capable of coherently restructuring entire codebases or finding system vulnerabilities. Karpathy notes this divergence in capabilities exists because coding tasks offer explicit, verifiable reward functions ideal for reinforcement learning and hold significantly more B2B value.

2026-04-08

Mon, 01 Jan 0001 00:00:00 +0000

Simon Willison — 2026-04-08#

Highlight#

The most substantial piece today is a deep-dive into Meta’s new Muse Spark model and its chat harness, where Simon successfully extracts the platform’s system tool definitions via direct prompting. His exploration of Meta’s built-in Python Code Interpreter and visual_grounding capabilities highlights a powerful, sandbox-driven approach to combining generative AI with programmatic image analysis and exact object localization.

Posts#

Meta’s new model is Muse Spark, and meta.ai chat has some interesting tools Meta has launched Muse Spark, a new hosted model currently accessible as a private API preview and directly via the meta.ai chat interface. By simply asking the chat harness to list its internal tools and their exact parameters, Simon documented 16 different built-in tools. Standouts include a Python Code Interpreter (container.python_execution) running Python 3.9 and SQLite 3.34.1, mechanisms for creating web artifacts, and a highly capable container.visual_grounding tool. He ran hands-on experiments generating images of a raccoon wearing trash, then used the platform’s Python sandbox and grounding tools to extract precise, nested bounding boxes and perform object counts (like counting whiskers or his classic pelicans). Although the model is closed for now, infrastructure scaling and comments from Alexandr Wang suggest future versions could be open-sourced.

2026-04-04

Mon, 01 Jan 0001 00:00:00 +0000

Engineering Reads — 2026-04-04#

The Big Idea#

Raw LLM intelligence is no longer the primary bottleneck for AI-assisted development; the real engineering challenge is building the system scaffolding—memory, tool execution, and repository context—that turns a stateless model into an effective, autonomous coding agent.

Deep Reads#

[Components of A Coding Agent] · Sebastian Raschka · Sebastian Raschka Magazine The core insight of this piece is that an LLM alone is just a stateless text generator; to do useful software engineering, it needs a surrounding agentic architecture. Raschka details the necessary scaffolding: equipping the model with tool use, stateful memory, and deep repository context. The technical mechanism relies on building an environment where the model can fetch file structures, execute commands, and persist state across conversational turns rather than just blindly emitting isolated code snippets. The tradeoff here is a steep increase in system complexity—managing context windows, handling tool execution failures, and maintaining state transitions is often much harder than prompting the model itself. Systems engineers and developers building AI integrations should read this to understand the practical anatomy of modern autonomous developer tools.

2026-04-07

Mon, 01 Jan 0001 00:00:00 +0000

Simon Willison — 2026-04-07#

Highlight#

Anthropic’s decision to restrict access to their new Claude Mythos model underscores a massive, sudden shift in AI capabilities. It is a fascinating look at an industry-wide reckoning as open-source maintainers transition from dealing with “AI slop” to facing a tsunami of highly accurate, sophisticated vulnerability reports.

Posts#

[Anthropic’s Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me] · Source Anthropic has delayed the general release of Claude Mythos, a general-purpose model similar to Claude Opus 4.6, opting instead to limit access to trusted partners under “Project Glasswing” so they can patch foundational internet systems. Simon digs into the context, tracking how credible security professionals are warning about the ability of frontier LLMs to chain multiple minor vulnerabilities into sophisticated exploits. He even uses git blame to independently verify a 27-year-old OpenBSD kernel bug discovered by the model. He concludes that delaying the release until new safeguards are built, while providing $100M in credits to defenders, is a highly reasonable trade-off.

2026-04-05

Mon, 01 Jan 0001 00:00:00 +0000

Simon Willison — 2026-04-05#

Highlight#

Simon highlights a deep-dive post by Lalit Maganti on the realities of “agentic engineering” when building a robust SQLite parser. The piece beautifully articulates a crucial lesson for our space: while AI is incredible at plowing through tedious low-level implementation details, it struggles significantly with high-level design and architectural decisions where there isn’t an objectively right answer.

Posts#

Eight years of wanting, three months of building with AI Simon shares a standout piece of long-form writing by Lalit Maganti on the process of building syntaqlite, a parser and formatter for SQLite. Claude Code was instrumental in overcoming the initial hurdle of implementing 400+ tedious grammar rules, allowing Lalit to rapidly vibe-code a working prototype. However, the post cautions that relying on AI for architectural design led to deferred decisions and a confusing codebase, ultimately requiring a complete rewrite with more human-in-the-loop decision making. The core takeaway is that while AI excels at tasks with objectively checkable answers, it remains weak at subjective design and system architecture.

Simon Willison

Mon, 01 Jan 0001 00:00:00 +0000

Simon Willison — Week of 2026-04-04 to 2026-04-10#

Highlight of the Week#

Anthropic’s decision to delay the general release of their highly capable Claude Mythos model under “Project Glasswing” marks a significant turning point in the AI industry. The move underscores a massive shift in frontier model capabilities, as models evolve from generating text to autonomously chaining multiple minor vulnerabilities into sophisticated exploits, requiring a new level of security safeguards before release.