[arXiv]score: 0.12

EELMA Uses Information-Theoretic Empowerment to Evaluate LM Agents Without Benchmarks

May 29, 2026

EELMA estimates agent empowerment — an information-theoretic measure of influence over future states — from multi-turn text interactions, showing strong correlation with task performance across textual games and web/tool-use environments. The approach offers a scalable alternative to manually designed benchmarks for comparing LM agents.

cs.AIcs.LG

HOW THIS AFFECTS YOU

●

builderPotentially useful for automated agent evaluation pipelines where designing task-specific benchmarks is expensive, though practical integration details are not yet established.

●

researcherEmpowerment correlation with task performance across diverse environments suggests it could serve as a cheap proxy metric for agent capability without requiring ground-truth task labels.

SOURCE

https://arxiv.org/abs/2509.22504

← back to feed