Lean Proof Assistant as Process Oracle Improves RL for Theorem Proving
June 19, 2026
Using Lean's elaboration engine as a tactic-level process oracle, this approach supplies dense, verifier-grounded credit signals during GRPO-style RL training rather than relying on a single binary outcome reward. First-error propagation and first-token credit methods balance outcome- and process-level advantages, improving formal theorem proving performance.
HOW THIS AFFECTS YOU
●
researcherThe tactic-level credit assignment method is directly applicable to any formal reasoning domain with a symbolic verifier, offering a concrete alternative to sparse outcome-only RL signals.