[HUGGINGFACE]score: 0.69
LVSA Cuts Long-Video Diffusion Compute 3.17x with Training-Free Sparse Attention
May 28, 2026
Long Video Sparse Attention (LVSA) applies block-sparse attention with structured windows and rotating global anchors to video diffusion transformers, reducing compute by up to 3.17x on Wan 2.1 without retraining. It also eliminates the frozen-frame degradation that occurs beyond training-horizon lengths by removing fixed-grid bias.
paper
HOW THIS AFFECTS YOU
●
builderDrop-in 3.17x inference speedup for Wan 2.1 video diffusion with no retraining required, directly reducing cost for long-video generation pipelines in production.
●
researcherIdentifies fixed-grid bias as the cause of long-range temporal artifacts and resolves it with rotating global anchors, a transferable insight for other video transformer architectures.