[arXiv]score: 0.58
CoVRL Couples Variational Inference with RL to Enable Verifier-Free LLM Reasoning Training
May 26, 2026
CoVRL constructs a composite distribution integrating prior (question-conditioned) and posterior (answer-conditioned) sampling to couple reasoning trace generation with answer information, improving exploration efficiency and trace-answer coherence in verifier-free RL for LLM reasoning.
cs.CLcs.AI
HOW THIS AFFECTS YOU
●
researcherThe variational inference framing of reasoning trace sampling is a principled improvement over naive verifier-free RL methods that decouple traces from answers.