[HN]score: 0.25

What happens when you run a CUDA kernel?

June 29, 2026

A detailed walkthrough traces the full execution path of a simple CUDA vector-addition kernel on an RTX 4090, from nvcc's multi-stage compilation pipeline (cicc to PTX, ptxas to SASS, fatbinary embedding) through ~900 ioctls and a memory-mapped doorbell register to actual warp execution. The piece covers PTX's virtual ISA, SASS instruction encoding, the CUDA driver's role in GPU context setup, and how thread blocks are scheduled onto SMs.

read original ↗fergusfinn.com

← back to feed