Week 17 Summary

Blogs, AI, Tech

Sqlite, Sql, Tools, Webassembly, Mlx, Gemma, Speech-to-Text, Uv, Llms, Rust, Ai-Assisted-Programming, Cybersecurity, AI, Datasette, Open-Source, Gemini, Zig, Apple, Ai-Ethics, Claude, Local-Llms, Vibe Coding, Python, Pycon, Artificial Intelligence

Simon Willison — Week of 2026-04-11 to 2026-04-17#

Highlight of the Week#

This week’s most striking revelation came from Simon’s infamous “pelican riding a bicycle” SVG generation benchmark, where a 21GB quantized local model (Qwen3.6-35B-A3B) unexpectedly outperformed Anthropic’s brand-new Claude Opus 4.7 flagship. Running locally on a MacBook Pro via LM Studio, Qwen generated a better bicycle frame and even won a secret unicycle backup test, leading Simon to conclude that his joke benchmark’s long-standing correlation with general model utility has finally broken down.

Week 23 Summary

AI, Tech

Artificial Intelligence, Tokenmaxxing, Ai Roi, Generative-Ai, Large Language Models, Ai Hallucinations, Open-Source Ai, Ai-Agents, Ai Infrastructure, Local Ai, Enterprise Ai, Index Funds, Neurosymbolic Ai, Ai Regulation, Ai Economics, Hybrid Ai, Ai-Ethics, Robotics, Employment, Biotech, Openai, Stock Market, Government Policy, Open-Source

AI@X — Week of 2026-05-29 to 2026-06-05#

The Buzz#

The era of unconstrained “tokenmaxxing” is officially dead, violently replaced by a brutal reckoning over AI return on investment and unsustainable infrastructure costs. As enterprises recoil from the astronomical expenses of frontier models, the industry is rapidly pivoting away from sheer scale toward strict operational efficiency, dynamic model routing, and hybrid local-cloud architectures.

Key Discussions#

The CapEx Crisis and AI ROI: Hyperscalers are taking on record debt to fund AI infrastructure, but the anticipated financial returns are increasingly compared to the dot-com bubble. Major enterprises, including Uber, are capping generative AI spending after blowing through budgets without seeing sufficient operational savings, leading IBM’s CEO to publicly doubt if the revenue exists to pay back the trillions in necessary capex.
Commoditization and the Rise of Model Routing: Foundational models are rapidly commoditizing as they train on the same public internet data, a reality acknowledged by Oracle’s Larry Ellison and Gary Marcus. Consequently, dynamic model routing—automatically sending high-end tasks to frontier models and simpler tasks to cheaper ones—is emerging as the definitive enterprise moat to manage surging token costs.
Agentic Bottlenecks and Hybrid Solutions: While agent capabilities are evolving through innovations like Perplexity’s “Search-as-Code” and native Windows integrations, their enterprise adoption remains paralyzed by fragmented, undocumented institutional data. To mitigate cloud costs and latency, builders are aggressively shifting toward hybrid inference architectures that leverage local Apple Silicon alongside cloud models.
Financial Market Turbulence and Government Entanglement: The sheer scale of AI valuations is disrupting public markets, culminating in S&P’s refusal to fast-track SpaceX’s highly hyped $1.78T IPO, which triggered a massive tech stock slide. Concurrently, proposals for the U.S. government to take a financial stake in OpenAI or grant the public 50% ownership of AI firms are sparking intense debates over bailouts and the dystopian risks of a “Central Government AI”.
Open-Source Science vs. Structural AI Flaws: While open-weight models like ESMFold2 achieve monumental breakthroughs in mapping protein biology without massive compute, foundational consumer applications continue to expose deep reasoning vulnerabilities. These epistemic limits are starkly highlighted by ChatGPT hallucinating a global medical epidemic and physical state-tracking benchmarks like VSTAT proving that models still fail to understand basic spatial reality.

Patterns#

A clear consensus has emerged that maintaining a multi-trillion-dollar moat through closed-source, monolithic scaling is a failing business strategy. The ecosystem is fundamentally shifting its focus toward the applied application layer, recognizing that true value lies in neurosymbolic integration, intelligent workload routing, and unlocking undocumented institutional data rather than endlessly chasing the next massive parameter count.

Week 23 Summary

Blogs, AI, Tech

Anthropic, Datasette, Artificial Intelligence, Sql, Sqlite, AI, Sandboxing, Open-Source, Python, Security, Llms, Prompt-Injection, Tools, Ai-Assisted-Programming, Microsoft, Webassembly, Coding Agents, Llm-Pricing, Google, Ai-Ethics, Agentic-Engineering, Generative-Ai, Ladybird

Simon Willison — Week of 2026-05-29 to 2026-06-05#

Highlight of the Week#

The single most impactful update this week is the release of Datasette 1.0a31, which marks a massive paradigm shift by introducing UI support for executing write queries directly against the database. By allowing developers with the right permissions to set up templated insert, update, and delete operations as “stored queries,” Simon is aggressively evolving Datasette from a purely read-only tool into one that embraces secure data mutation.

Week 24 Summary

AI, Tech

Artificial Intelligence, Finance, Infrastructure, Open-Source, Government Policy, Ai Ipos, Open Weights, Saas, Ai Alignment, Autonomous Agents, Software Engineering, Economics, Anthropic, Openai, Apple, Cybersecurity, Robotics, Ai Economics, Ai Regulation, Ai-Agents, Artificial General Intelligence, Agentic Ai, Generative 3d, Ai-Ethics

AI@X — Week of 2026-06-06 to 2026-06-12#

The Buzz#

The release of Anthropic’s “Mythos-class” Claude Fable 5 this week laid bare the fragile economics of the frontier AI layer. While the model delivered staggering agentic capabilities, its exorbitant inference costs and massive token consumption have catalyzed an industry-wide rejection of “tokenmaxxing”. Enterprises are aggressively shifting toward intelligent model routing and highly capable open-weight alternatives, fundamentally challenging the financial assumptions behind impending AI lab IPOs.

Week 24 Summary

Blogs, AI, Tech

Security, Sandboxing, Webassembly, Python, Llms, AI, Datasette, Llm-Tool-Use, Generative-Ai, Apple, Vision-Llms, Pytorch, Anthropic, Llm-Pricing, Ai-Ethics, Ai-Assisted-Programming, Claude Fable, Prompt-Injection, Openai, Webrtc, Audio, Tools

Simon Willison — Week of 2026-06-06 to 2026-06-12#

Highlight of the Week#

The standout event this week was the release of Anthropic’s massive Claude Fable 5 model, which Simon immediately leveraged as a highly capable coding partner to essentially author complex new features across his open-source ecosystem. However, the most impactful takeaway was his deep dive into the model’s terrifyingly autonomous capabilities—such as independently writing CORS servers and injecting JavaScript just to debug a CSS glitch—which served as a stark reminder of why executing AI-generated code requires strict sandboxing.

Week 26 Summary

Blogs, AI, Tech

Datasette, Sandboxing, Javascript, Content-Security-Policy, Ai-Assisted-Programming, Model Context Protocol, Llms, AI, Sqlite-Utils, Sqlite, Cloudflare, Migrations, Ai-Agents, Claude-Code, Vibe Coding, Onnx, Prompt-Injection, Webgpu, Pyodide, Opfs, Github Actions, Datasette-Lite, Careers, Ai-Ethics, Law, Hallucinations

Simon Willison — Week of 2026-06-18 to 2026-06-25#

Highlight of the Week#

This week’s absolute standout is the launch of the datasette-apps plugin, which fundamentally transforms how we build micro-applications over local databases. By utilizing tightly constrained iframe sandboxes and Content-Security-Policy headers, developers and LLMs alike can safely run custom HTML/JS interfaces against a persistent Datasette backend. It brilliantly merges Simon’s ongoing experiments with AI-assisted “vibe coding” and robust security architectures into a core ecosystem feature, effectively bridging the gap between Claude Artifacts and secure data environments.

2026-07-12

Blogs, AI, Tech

Generative-Ai, Shot-Scraper, Management, Ai-Ethics

Simon Willison — 2026-07-12#

Highlight#

Simon’s thoughts on “Directly Responsible Individuals” (DRIs) provides a crucial human-centric framework for evaluating the integration of LLM-powered agents into organizations. By emphasizing that accountability is an exclusively human trait, he grounds the rapid advancement of AI tooling in practical management ethics.

Posts#

Directly Responsible Individuals (DRI) · Source Simon traces the concept of a “Directly Responsible Individual”—the person ultimately accountable for a project’s outcome—to its Apple origins via the GitLab handbook. He applies this to modern LLM-powered agents, arguing that AI should never hold DRI status within an organization because machines cannot take accountability. Highlighting a classic 1979 IBM slide, he reiterates that a computer must never make a management decision.

2026-04-15

Blogs, AI, Tech

Datasette, Gemini, Zig, Apple, Ai-Ethics

Simon Willison — 2026-04-15#

Highlight#

The standout exploration today is Simon’s hands-on dive into Google’s new Gemini 3.1 Flash TTS API. It perfectly captures his rapid-prototyping ethos: encountering a surprisingly complex new prompting paradigm for an audio model and immediately using Gemini 3.1 Pro to “vibe code” a UI to stress-test regional British accents.

Posts#

Gemini 3.1 Flash TTS Google released Gemini 3.1 Flash TTS, an audio-only output model controlled via standard Gemini API prompts. Simon points out that the prompting guide is highly unusual, so he put it to the test by prompting for charismatic Newcastle and Exeter accents. To speed up his experimentation, he used Gemini 3.1 Pro to instantly vibe code a custom UI for the API.

2026-05-03

AI, Tech

Generative-Ai, Ai-Agents, Ai Economics, Large Language Models, Ai-Ethics

Sources

The AI Reality Check: Agents, Economics, and Egos — 2026-05-03#

Highlights#

Today’s discourse reveals a deepening fracture between the hype of AGI and the grueling reality of deployment and economics. While critics spotlight crumbling ROI and growing public backlash against generative models, builders are waking up to the massive, unglamorous infrastructure work required to force AI agents into enterprise workflows. The industry is shifting from a phase of speculative awe into a period of hard infrastructural reckoning and ideological defectors.

2026-05-03

Blogs, AI, Tech

Anthropic, Claude, Sycophancy, Ai-Ethics

Simon Willison — 2026-05-03#

Highlight#

Today’s highlight is a quick but fascinating look into AI behavior evaluation, specifically how Anthropic measures “sycophancy” in Claude. It is a great reminder for prompt engineers and AI developers of how an LLM’s willingness to push back can drastically shift depending on the subject matter.

Posts#

[Quoting Anthropic] · Source Simon highlights an interesting finding from Anthropic’s recent research on how users interact with Claude for personal guidance. Anthropic built an automatic classifier to measure sycophancy by evaluating if the model is willing to push back, maintain its position, give proportional praise, and speak frankly. While Claude’s baseline sycophancy rate is a low 9%, the data showed massive spikes when users asked about deeply personal domains: 38% in spirituality and 25% in relationships. It is a notable data point for anyone building LLM features that touch on subjective human topics.