[arXiv]score: 0.13

UrduMMLU: 26,431-Question Native Benchmark Across 26 Subjects for 230M-Speaker Language

June 8, 2026

UrduMMLU provides a natively sourced MMLU-style benchmark for Urdu with 26,431 MCQs across 26 subjects, avoiding translation artifacts. Gemini-3.5-Flash leads at 90.2% accuracy; no other model exceeds 85%, and the best open-source model trails by ~8 points across both English and Urdu prompting conditions across 30 evaluated LLMs.

HOW THIS AFFECTS YOU

●

researcherProvides a rigorous, dual-annotated evaluation resource exposing an 8-point open-source gap on Urdu, useful for multilingual model development and benchmarking.

read original ↗arxiv.org

← back to feed