Don't Let Gains FADE: Breaking Down Policy Gradient Weights in RL
July 3, 2026
Title: Don't Let Gains FADE: Breaking Down Policy Gradient Weights in RL
Source: arxiv
Decomposing advantage functions into positive and negative gradient mass reveals how imbalanced updates collapse entropy or weight geometry. FADE (Focal Advantage with Dynamic Entropy) addresses these trade-offs by dynamically scheduling advantage focus between hard problems and exploration to prevent diversity collapse during RL post-training.