UrduMMLU: 26,431-Question Native Benchmark Across 26 Subjects for 230M-Speaker Language
June 8, 2026
UrduMMLU provides a natively sourced MMLU-style benchmark for Urdu with 26,431 MCQs across 26 subjects, avoiding translation artifacts. Gemini-3.5-Flash leads at 90.2% accuracy; no other model exceeds 85%, and the best open-source model trails by ~8 points across both English and Urdu prompting conditions across 30 evaluated LLMs.
HOW THIS AFFECTS YOU
●
researcherProvides a rigorous, dual-annotated evaluation resource exposing an 8-point open-source gap on Urdu, useful for multilingual model development and benchmarking.