[HUGGINGFACE]score: 0.48
ACTS: MDP Controller Steers Frozen Reasoner with Budget-Aware Strategy Phrases
June 1, 2026
Agentic Chain-of-Thought Steering (ACTS) frames inference-time reasoning control as an MDP where a controller agent observes the trace and token budget, then issues strategy-plus-phrase steering actions to a frozen reasoner. This enables explicit control over reasoning strategy at inference time without modifying the base model.
paper
HOW THIS AFFECTS YOU
●
builderYou can apply this to reduce token waste in chain-of-thought pipelines while gaining explicit inference-time control over reasoning behavior without retraining.
●
researcherACTS introduces a formalized MDP framing for reasoning control that separates controller policy from base model, enabling cleaner ablations on reasoning strategy.