[arXiv]score: 0.15

RLVR Model Merging Fails Due to Near-Orthogonal Sparse Parameter Updates

June 18, 2026

RLVR post-training produces sparse parameter updates that are spread farther apart in weight space than SFT updates, forming near-orthogonal directions that make model merging fragile — the opposite of what sparsity would suggest. The effect is attributed to RL stochasticity and diversity of emergent reasoning patterns, meaning training-free capability aggregation from RLVR models is unreliable.

HOW THIS AFFECTS YOU

●

researcherThis rules out model merging as a cheap path to combining RLVR-trained reasoning specialists and motivates studying alternative aggregation methods like ensemble routing.

read original ↗arxiv.org

← back to feed