[HUGGINGFACE]score: 0.48
PEEK Distills Frame Relevance Rankings Into Lightweight Video Sampling Model
May 28, 2026
PEEK trains a lightweight temporal model to predict caption-conditioned frame relevance by distilling rankings from a stronger teacher, replacing uniform sampling in video captioning pipelines. It outperforms state-of-the-art adaptive sampling methods on ActivityNet Captions and MSR-VTT while reducing compute.
paper
HOW THIS AFFECTS YOU
●
builderYou can swap uniform frame sampling for PEEK in video captioning pipelines to improve quality without the compute cost of existing adaptive methods.
●
researcherThe distillation approach for temporal relevance ranking is applicable beyond captioning to any video-language task bottlenecked by frame selection.