[arXiv]score: 0.15

Lean Proof Assistant as Process Oracle Improves RL for Theorem Proving

June 19, 2026

Using Lean's elaboration engine as a tactic-level process oracle, this approach supplies dense, verifier-grounded credit signals during GRPO-style RL training rather than relying on a single binary outcome reward. First-error propagation and first-token credit methods balance outcome- and process-level advantages, improving formal theorem proving performance.

HOW THIS AFFECTS YOU

●

researcherThe tactic-level credit assignment method is directly applicable to any formal reasoning domain with a symbolic verifier, offering a concrete alternative to sparse outcome-only RL signals.

read original ↗arxiv.org

← back to feed