[HUGGINGFACE]score: 0.62

PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

May 19, 2026

PlanningBench introduces a controllable planning benchmark that generates scalable, verifiable planning data for LLM evaluation and training. Unlike static benchmarks, it ties difficulty to structural sources rather than surface proxies and supports automatic verification. Relevant for teams building or evaluating reasoning-heavy LLM systems where planning capability is a bottleneck.

paper

SOURCE

https://huggingface.co/papers/2605.20873

← back to feed