[HUGGINGFACE]score: 0.71
ECHO Adds Auxiliary Environment Prediction Loss to CLI Agent RL Training
May 22, 2026
ECHO augments GRPO-style RL for CLI agents with a cross-entropy loss that trains the policy to predict environment observations (stdout, errors, logs), extracting supervision from failed rollouts that standard policy-gradient ignores.
paper
HOW THIS AFFECTS YOU
●
builderYou can apply ECHO's hybrid objective to improve CLI agent training sample efficiency by extracting learning signal from failed rollouts rather than discarding them.
●
researcherUsing the terminal output stream as a free auxiliary supervision signal is a clean insight that could generalize to other tool-use agent RL settings.