[arXiv]score: 0.12
EELMA Uses Information-Theoretic Empowerment to Evaluate LM Agents Without Benchmarks
May 29, 2026
EELMA estimates agent empowerment — an information-theoretic measure of influence over future states — from multi-turn text interactions, showing strong correlation with task performance across textual games and web/tool-use environments. The approach offers a scalable alternative to manually designed benchmarks for comparing LM agents.
cs.AIcs.LG
HOW THIS AFFECTS YOU
●
builderPotentially useful for automated agent evaluation pipelines where designing task-specific benchmarks is expensive, though practical integration details are not yet established.
●
researcherEmpowerment correlation with task performance across diverse environments suggests it could serve as a cheap proxy metric for agent capability without requiring ground-truth task labels.