[arXiv]score: 0.12
EVA-Bench: End-to-End Evaluation Framework for Voice Agents
May 29, 2026
EVA-Bench evaluates voice agents via bot-to-bot audio conversations with automatic simulation validation and two composite metrics: EVA-A covering task completion and speech fidelity, and EVA-X covering conversation flow, conciseness, and turn-taking. It targets the gap where no existing benchmark jointly handles realistic multi-turn simulation and voice-specific failure modes.
cs.SDcs.AIcs.CLcs.LG
HOW THIS AFFECTS YOU
●
builderYou can use EVA-Bench to stress-test voice agent pipelines against realistic multi-turn audio conversations and measure failure modes beyond simple task completion.
●
researcherEVA-A and EVA-X provide a structured evaluation schema for voice agent research covering dimensions absent from text-based benchmarks.