[HUGGINGFACE]score: 0.80
Unsupervised Process Reward Models
May 10, 2026
Hugging Face proposes unsupervised Process Reward Models (uPRM) that train step-level reward functions for LLM reasoning without human annotations by deriving scoring functions from LLM next-token predictions.
paper