[HUGGINGFACE]score: 0.62
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
May 19, 2026
PlanningBench introduces a controllable planning benchmark that generates scalable, verifiable planning data for LLM evaluation and training. Unlike static benchmarks, it ties difficulty to structural sources rather than surface proxies and supports automatic verification. Relevant for teams building or evaluating reasoning-heavy LLM systems where planning capability is a bottleneck.
paper