[HUGGINGFACE]score: 0.29

OmniInteract Benchmark Tests Real-Time Omnimodal LLMs on Live Audio-Visual Streams

May 25, 2026

OmniInteract evaluates omnimodal models on 250 videos with 1,430 temporally grounded response slots, requiring online inference over unmodified audio-visual streams where queries are embedded in the audio track. Models must detect triggers, decide response timing, and answer without access to future frames — covering real-time, proactive, and nested scenarios.

paper

HOW THIS AFFECTS YOU

●

builderIf you're building real-time audio-visual assistants, OmniInteract provides a concrete evaluation harness for streaming response timing and multimodal trigger detection.

●

researcherThe benchmark's native streaming evaluation with embedded audio triggers is a more realistic test of omnimodal latency and trigger detection than offline video QA setups.

SOURCE

https://huggingface.co/papers/2605.26485

← back to feed