[r/MachineLearning]score: 0.16
TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]
May 5, 2026
TritonSigmoid is an open-source Triton kernel implementing padding-aware sigmoid attention, achieving 515 TFLOPS on H100, outperforming FlashAttention-2 at 361 TFLOPS and FlashSigmoid at 440 TFLOPS. Unlike softmax, sigmoid attention allows simultaneous strong activation across multiple tokens, critical for single-cell genomics where 200-16,000+ genes require non-competitive co-attention. The kernel delivers 25% better cell-type separation and prevents catastrophic training divergence observed with softmax on variable-length biological sequences. Genomics ML engineers and foundation model researchers working with sparse, high-variance sequence lengths should prioritize evaluating this immediately.
research