[arXiv]score: 0.12

EVA-Bench: End-to-End Evaluation Framework for Voice Agents

May 29, 2026

EVA-Bench evaluates voice agents via bot-to-bot audio conversations with automatic simulation validation and two composite metrics: EVA-A covering task completion and speech fidelity, and EVA-X covering conversation flow, conciseness, and turn-taking. It targets the gap where no existing benchmark jointly handles realistic multi-turn simulation and voice-specific failure modes.

cs.SDcs.AIcs.CLcs.LG

HOW THIS AFFECTS YOU

●

builderYou can use EVA-Bench to stress-test voice agent pipelines against realistic multi-turn audio conversations and measure failure modes beyond simple task completion.

●

researcherEVA-A and EVA-X provide a structured evaluation schema for voice agent research covering dimensions absent from text-based benchmarks.

SOURCE

https://arxiv.org/abs/2605.13841

← back to feed