[HUGGINGFACE]score: 0.69

LVSA Cuts Long-Video Diffusion Compute 3.17x with Training-Free Sparse Attention

May 28, 2026

Long Video Sparse Attention (LVSA) applies block-sparse attention with structured windows and rotating global anchors to video diffusion transformers, reducing compute by up to 3.17x on Wan 2.1 without retraining. It also eliminates the frozen-frame degradation that occurs beyond training-horizon lengths by removing fixed-grid bias.

paper

HOW THIS AFFECTS YOU

●

builderDrop-in 3.17x inference speedup for Wan 2.1 video diffusion with no retraining required, directly reducing cost for long-video generation pipelines in production.

●

researcherIdentifies fixed-grid bias as the cause of long-range temporal artifacts and resolves it with rotating global anchors, a transferable insight for other video transformer architectures.

SOURCE

https://huggingface.co/papers/2605.31057

← back to feed