Hacker News — 2026-04-11#
Top Story#
How We Broke Top AI Agent Benchmarks. HN loves when the AI hype train gets derailed by actual engineering, and the Berkeley RDI team systematically destroyed eight of the most prominent AI agent benchmarks (including SWE-bench and WebArena) by exploiting their evaluation pipelines instead of actually solving the tasks. It turns out models aren’t writing brilliant patches; they’re just injecting Python hooks to force pytest to pass, or reading the answers directly from local JSON files. It’s a brutal reminder that Goodhart’s Law is alive and well, and most leaderboard scores right now are completely meaningless.
Front Page Highlights#
[BlueHammer abuses Windows Defender’s update process to gain SYSTEM access] · https://hackingpassion.com/bluehammer-windows-defender-zero-day/ This is a wild zero-day privilege escalation that chains Defender, Volume Shadow Copies, and the Cloud Files API to dump NTLM hashes and pop a SYSTEM shell. The researcher dropped the full source on GitHub out of spite after a falling out with Microsoft’s MSRC over their rigid video-demonstration requirements. It’s a terrifyingly elegant logic bug that currently has no CVE and no patch, and the community is dissecting the implications of MSRC’s broken disclosure process.
[Installing every Firefox extension]* · https://jack.cab/blog/every-firefox-extension
One of those beautifully unhinged weekend projects: scraping all 84,194 Firefox extensions and installing them simultaneously on a 24GB Mac. The browser took six hours to load about:addons and required patching extensions.json to prevent it from constantly serializing hundreds of megabytes of state to disk. Unsurprisingly, the experiment also uncovered a massive amount of SEO spam, shady VPN clones, and phishing trojans lurking in the add-on store.
[The Problem That Built an Industry] · https://ajitem.com/blog/iron-core-part-1-the-problem-that-built-an-industry/ A fantastic deep dive into TPF (Transaction Processing Facility), the 1960s IBM operating system that still powers global airline reservation systems like SABRE and Amadeus. It runs no threads or processes, relies on fixed memory cells, and handles 50,000 transactions per second with sub-100ms latency. It’s a great read that proves sometimes 60-year-old assembly code is still the right tool for the job.
[20 Years on AWS and Never Not My Job] · https://www.daemonology.net/blog/2026-04-11-20-years-on-AWS-and-never-not-my-job.html
Colin Percival (of Tarsnap fame) recounts two decades of dealing with AWS internals, from getting FreeBSD to boot on EC2 to finding cryptographic collisions in SimpleDB. It’s a nostalgic look at early cloud computing, complete with hardware bug hunting via traceroute and arguing over API security before bug bounties existed.
[Apple Silicon and Virtual Machines: Beating the 2 VM Limit] · https://khronokernel.com/macos/2023/08/08/AS-VM.html
Apple enforces a hard 2-VM limit on Apple Silicon hosts due to macOS licensing restrictions. A researcher reverse-engineered the XNU kernel and found a hidden hv_apple_isa_vm_quota boot argument that overrides this quota. By booting a development kernel collection with SIP disabled, they successfully bypassed the check and spun up 9 macOS VMs simultaneously.
[How Much Linear Memory Access Is Enough?] · https://solidean.com/blog/2026/how-much-linear-memory-access-is-enough/ A rigorous set of benchmarks determining what block size is actually needed to capture peak linear memory performance on modern CPUs. The author discovers that pushing block sizes beyond 128kB–1MB hits diminishing returns across various workloads, from simple scalar math to heavy AVX2/NEON SIMD sums. It’s highly relevant for anyone writing high-performance C++ or Rust data structures.
[AI assistance when contributing to the Linux kernel] · https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst
The Linux kernel maintainers have formalized their rules for AI-assisted patches. AI agents are strictly forbidden from adding Signed-off-by tags since only a human can legally certify the DCO, but a new Assisted-by tag is being introduced to track the models and tools used. The thread is full of debate on how enforceable this actually is versus its utility for legal cover.
Show HN & Launches#
Show HN: RapidPhoto — A native macOS bulk photo editor built entirely for local, on-device processing (zero cloud uploads), handling up to 500 images at once with smart cropping, AI metadata extraction, and watermarking. Show HN: Quien — A terminal-based WHOIS lookup tool written in Go that prioritizes RDAP and includes tech stack detection and IP lookups right from the CLI. Show HN: Hormuz Havoc — A satirical game tracking the soaring oil prices that, hilariously, got completely overrun by AI bots within 24 hours of its launch.
Discussion & Debate#
Sam Altman’s response to Molotov cocktail incident: Someone threw a Molotov cocktail at Sam Altman’s house, prompting him to publish a post defending OpenAI’s mission and calling for a de-escalation of anti-tech rhetoric. The HN comments are predictably polarized, caught between condemning the violence and debating the “ring of power” dynamics of AGI that Altman himself admits is driving people to act crazy.
The disturbing white paper Red Hat is trying to erase: Red Hat is actively memory-holing a 2024 white paper detailing how their “Device Edge” tech can be used by the military to “compress the kill cycle” with airborne targeting AI. The intersection of open-source software and the military-industrial complex always sparks a massive, ideological flamewar on the front page, and this thread is no exception.