HACKOBAR_item
[arXiv]score: 0.24

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

May 7, 2026
FREIA, an unsupervised RL algorithm for LLMs, uses Free Energy-Driven Rewards to balance consensus-exploration tradeoffs and Adaptive Advantage Shaping to dynamically adjust learning signals based on reward statistics, achieving improvements over unsupervised baselines across nine datasets spanning mathematical reasoning, commonsense, and logical inference tasks.
cs.CLcs.ETcs.LG