[arXiv]score: 0.22

VideoMLA Cuts Video Diffusion KV Cache Memory 92.7% with Low-Rank Latents

May 29, 2026

VideoMLA applies Multi-Head Latent Attention (MLA) to causal video diffusion, replacing per-head KV pairs with a shared low-rank content latent and decoupled 3D-RoPE positional key, achieving 92.7% per-token KV memory reduction while retaining generation quality. Notably, the compression works despite video attention not being low-rank in the spectral sense that motivates MLA in language models.

cs.CVcs.AI

HOW THIS AFFECTS YOU

●

builderA 92.7% KV memory reduction at every cached layer directly reduces streaming memory and latency for long-rollout video generation, making minute-scale video diffusion more feasible on constrained hardware.

●

researcherThe finding that MLA succeeds in video diffusion despite high effective rank in pretrained attention weights challenges the standard spectral justification and warrants further investigation into why low-rank compression generalizes here.

SOURCE

https://arxiv.org/abs/2605.30351

← back to feed