[HUGGINGFACE]score: 0.36
Flat-Pack Bench Tests Fine-Grained Step-Level Video Understanding in LVLMs
May 19, 2026
Flat-Pack Bench uses furniture assembly videos to evaluate step-by-step spatio-temporal reasoning in large vision-language models, targeting a gap in existing benchmarks that focus on coarse action classification and captioning. The benchmark requires models to track procedural state across visually similar steps.
paper
HOW THIS AFFECTS YOU
●
researcherProvides a harder evaluation surface for LVLMs on procedural video understanding where current coarse benchmarks show ceiling effects.