[r/MachineLearning]score: 0.16

TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]

May 5, 2026

TritonSigmoid is an open-source Triton kernel implementing padding-aware sigmoid attention, achieving 515 TFLOPS on H100, outperforming FlashAttention-2 at 361 TFLOPS and FlashSigmoid at 440 TFLOPS. Unlike softmax, sigmoid attention allows simultaneous strong activation across multiple tokens, critical for single-cell genomics where 200-16,000+ genes require non-competitive co-attention. The kernel delivers 25% better cell-type separation and prevents catastrophic training divergence observed with softmax on variable-length biological sequences. Genomics ML engineers and foundation model researchers working with sparse, high-variance sequence lengths should prioritize evaluating this immediately.

research

SOURCE

https://www.reddit.com/r/MachineLearning/comments/1t4kalf/tritonsigmoid_a_fast_paddingaware_sigmoid/

← back to feed