[HUGGINGFACE]score: 0.55
Chunk-Level Guided Generation Uses LLM Likelihoods as Training-Free PRM
May 31, 2026
A small model samples fixed-length candidate chunks at each generation step, and a larger off-the-shelf LLM scores them via likelihood without generating text, steering the small model away from incorrect reasoning paths before they propagate. This eliminates the need for step-level reward model training while outperforming best-of-N selection on math reasoning benchmarks.
paper
HOW THIS AFFECTS YOU
●
builderYou can improve small model math accuracy at inference time using an existing large model as a scorer with no additional training or labeled data.
●
researcherThis is a practical training-free baseline that challenges the necessity of PRMs for process-level guidance in mathematical reasoning.