[HUGGINGFACE]score: 0.55

Chunk-Level Guided Generation Uses LLM Likelihoods as Training-Free PRM

May 31, 2026

A small model samples fixed-length candidate chunks at each generation step, and a larger off-the-shelf LLM scores them via likelihood without generating text, steering the small model away from incorrect reasoning paths before they propagate. This eliminates the need for step-level reward model training while outperforming best-of-N selection on math reasoning benchmarks.

paper

HOW THIS AFFECTS YOU

●

builderYou can improve small model math accuracy at inference time using an existing large model as a scorer with no additional training or labeled data.

●

researcherThis is a practical training-free baseline that challenges the necessity of PRMs for process-level guidance in mathematical reasoning.

SOURCE

https://huggingface.co/papers/2606.01682

← back to feed