2026-04-13

Simon Willison — 2026-04-13#

Highlight#

Today’s standout is Simon’s hands-on research into the newly released servo crate using Claude Code. It perfectly captures his classic approach to AI-assisted exploration, demonstrating how quickly you can prototype a Rust CLI tool and evaluate WebAssembly compatibility with an LLM sidekick.

Posts#

[Exploring the new servo crate] · Source Following the initial release of the embeddable servo browser engine on crates.io, Simon tasked Claude Code for web with exploring its capabilities. The AI successfully generated a working Rust CLI tool called servo-shot for taking web screenshots. While compiling Servo itself to WebAssembly proved unfeasible due to its heavy use of threads and SpiderMonkey dependencies, Claude instead built a playground page utilizing a WebAssembly build of the html5ever and markup5ever_rcdom crates to parse HTML fragments.

2026-04-14

Sources

The Agentic Enterprise and Liability Battlegrounds — 2026-04-14#

Highlights#

Today’s discussions reveal a sharp dichotomy in the AI ecosystem: while builders are rapidly integrating agentic workflows and local AI into production, the policy and safety landscapes are becoming highly contentious. The signal-rich takeaways highlight enterprises preparing for dedicated “agent deployer” roles, open-source AI advancing on mobile hardware, and a brewing battle over frontier model liability and AI anthropomorphism.

2026-04-14

Sources

AI Reddit — 2026-04-14#

The Buzz#

Tencent’s HY-World 2.0 is officially dropping, bringing open-source multimodal 3D world generation that exports directly to game engines as editable meshes and 3D Gaussian Splatting, pushing well beyond standard video synthesis. Meanwhile, SenseNova’s NEO-unify is turning heads by ditching the VAE and vision encoder entirely for a 2B parameter native image generation architecture that processes raw pixels with an impressive 31.56 PSNR. On the cybersecurity front, OpenAI quietly rolled out GPT-5.4-Cyber to trusted testers to rival Anthropic’s Mythos, just as the UK AI Security Institute reported Mythos successfully completed 3 out of 10 simulated corporate network attacks without human intervention.

2026-04-14

Simon Willison — 2026-04-14#

Highlight#

Simon highlights a fascinating paradigm shift in AI security: treating vulnerability discovery as an economic “proof of work” equation where spending more tokens yields better hardening. This creates a compelling new argument for the enduring value of open-source libraries in the age of vibe-coding, as the massive cost of AI security reviews can be shared across all of a project’s users.

Posts#

[datasette PR #2689: Replace token-based CSRF with Sec-Fetch-Site header protection] · Source Simon has replaced Datasette’s cumbersome token-based CSRF protection with a new middleware relying on the Sec-Fetch-Site header, inspired by Filippo Valsorda’s research and recent changes in Go 1.25. This modern approach eliminates the need to scatter hidden CSRF token inputs throughout templates or selectively disable protection for external APIs. Interestingly, while Claude Code handled the bulk of the commits under Simon’s guidance with cross-review by GPT-5.4, Simon chose to hand-write the PR description himself as an exercise in conciseness and keeping himself honest.

2026-04-15

Sources

AI Deployment Realities & The Open Source Security Squeeze — 2026-04-15#

Highlights#

Today’s discourse reveals a sobering maturation in the AI space, shifting the focus from model hype to the gritty mechanics of practical deployment and the resulting friction,,. While enterprises are defining net-new technical roles and methodologies to integrate agents successfully, the community is simultaneously grappling with a rising backlash against AI “workslop” and the realization that AI-driven automated exploitation is actively forcing companies to close their open-source codebases-,,-.

2026-04-15

Sources

AI Reddit — 2026-04-15#

The Buzz#

A fascinating shift in prompt injection strategies has surfaced, proving that the most effective attacks no longer rely on technical overrides but instead weaponize a model’s own alignment training. Researchers analyzing over 1,400 injection attempts discovered that framing requests as moral compliance tests or ethical hypotheticals forces models to willingly leak their system prompts and secrets. This revelation suggests that a model’s inherent helpfulness and ethical reasoning are actually its largest attack surfaces, rendering traditional keyword-based defenses largely obsolete.

2026-04-15

Simon Willison — 2026-04-15#

Highlight#

The standout exploration today is Simon’s hands-on dive into Google’s new Gemini 3.1 Flash TTS API. It perfectly captures his rapid-prototyping ethos: encountering a surprisingly complex new prompting paradigm for an audio model and immediately using Gemini 3.1 Pro to “vibe code” a UI to stress-test regional British accents.

Posts#

Gemini 3.1 Flash TTS Google released Gemini 3.1 Flash TTS, an audio-only output model controlled via standard Gemini API prompts. Simon points out that the prompting guide is highly unusual, so he put it to the test by prompting for charismatic Newcastle and Exeter accents. To speed up his experimentation, he used Gemini 3.1 Pro to instantly vibe code a custom UI for the API.

2026-04-16

Sources

The Agentic Leap: Claude 4.7, Perplexity’s ‘Personal Computer’, and Codex Computer Use — 2026-04-16#

Highlights#

Today’s dominant signal is the rapid maturation of agentic capabilities and local computer orchestration. With massive updates to OpenAI’s Codex and Anthropic’s release of Claude Opus 4.7, models are increasingly breaking out of the chat interface to operate GUIs, manage local file systems, and execute complex workflows directly on our machines.

2026-04-16

Sources

AI Reddit — 2026-04-16#

The Buzz#

The community finally has hard data to back up the “vibes” that Claude Code got perceptibly worse recently. An AMD engineer analyzed over 6,800 sessions and proved that Anthropic silently dropped the default thinking effort to ‘medium’, causing a massive spike in blind edits and unexpected API costs. It is a stark reminder that relying on a single frontier model with zero fallback is a massive liability when lab behavior changes unannounced.

2026-04-16

Simon Willison — 2026-04-16#

Highlight#

The most fascinating takeaway today is a surprising win for local AI: a 21GB quantized Qwen3.6 model running on a laptop beat Anthropic’s brand-new Claude Opus 4.7 at Simon’s “pelican riding a bicycle” SVG generation benchmark. This result leads Simon to conclude that his joke benchmark’s long-standing correlation with a model’s general utility has finally broken down.

Posts#

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 · Source Simon put the day’s two major model releases—Alibaba’s Qwen3.6-35B-A3B and Anthropic’s Claude Opus 4.7—through his infamous “pelican riding a bicycle” SVG generation benchmark. Running locally on a MacBook Pro via LM Studio, the quantized Qwen model produced a better bicycle frame than Opus, and even won a “secret backup test” generating a flamingo riding a unicycle. Simon admits this breaks the historical correlation between his SVG benchmark and a model’s general usefulness, noting he highly doubts the 21GB local model is actually more capable than Anthropic’s proprietary flagship.