[HUGGINGFACE]score: 0.29
OmniInteract Benchmark Tests Real-Time Omnimodal LLMs on Live Audio-Visual Streams
May 25, 2026
OmniInteract evaluates omnimodal models on 250 videos with 1,430 temporally grounded response slots, requiring online inference over unmodified audio-visual streams where queries are embedded in the audio track. Models must detect triggers, decide response timing, and answer without access to future frames — covering real-time, proactive, and nested scenarios.
paper
HOW THIS AFFECTS YOU
●
builderIf you're building real-time audio-visual assistants, OmniInteract provides a concrete evaluation harness for streaming response timing and multimodal trigger detection.
●
researcherThe benchmark's native streaming evaluation with embedded audio triggers is a more realistic test of omnimodal latency and trigger detection than offline video QA setups.