Hacker News

Hacker News — 2026-04-19#

Top Story#

Zero-Copy GPU Inference from WebAssembly on Apple Silicon On Apple Silicon, you can share a WebAssembly module’s linear memory directly with the GPU—meaning zero copies, no serialization, and no intermediate buffers. By composing mmap, Metal buffers, and Wasmtime’s custom memory allocator, the author ran a 1B parameter Llama model entirely from a Wasm guest with zero-copy overhead. This is pure, hardware-sympathetic engineering, proving that sandboxed runtimes don’t have to ruin performance if you just leverage the underlying physics of the chip.