[arXiv]score: 0.58

CoVRL Couples Variational Inference with RL to Enable Verifier-Free LLM Reasoning Training

May 26, 2026

CoVRL constructs a composite distribution integrating prior (question-conditioned) and posterior (answer-conditioned) sampling to couple reasoning trace generation with answer information, improving exploration efficiency and trace-answer coherence in verifier-free RL for LLM reasoning.

cs.CLcs.AI

HOW THIS AFFECTS YOU

●

researcherThe variational inference framing of reasoning trace sampling is a principled improvement over naive verifier-free RL methods that decouple traces from answers.

SOURCE

https://arxiv.org/abs/2512.12576

← back to feed