[arXiv]score: 0.74

ECHO Adds Auxiliary Environment Prediction Loss to GRPO, Improving CLI Agent Training

May 26, 2026

ECHO augments GRPO-style RL training with an auxiliary cross-entropy loss that trains CLI agents to predict terminal observation tokens from their own actions, reusing the same forward pass to extract learning signal from failed rollouts.

cs.LGcs.CL

HOW THIS AFFECTS YOU

●

builderYou can drop ECHO into existing GRPO-based agent training pipelines with no additional rollouts to improve sample efficiency on CLI and terminal-based agentic tasks.

●

researcherDemonstrates that environment observation tokens in agent rollouts contain exploitable supervision signal beyond sparse outcome rewards, with a concrete hybrid objective implementation.

SOURCE

https://arxiv.org/abs/2605.24517

← back to feed