●builderIf you're building video search or highlight extraction tools, current MLLMs are demonstrably unfit for multi-event queries — this benchmark quantifies the gap.
●researcherThe new metrics and dataset provide a rigorous evaluation surface for temporal reasoning in video MLLMs that existing benchmarks completely miss.