[arXiv]score: 0.44
DocAtlas: Multilingual Document Understanding Across 80+ Languages
May 14, 2026
DocAtlas constructs high-fidelity OCR datasets and benchmarks covering 82 languages and 9 document understanding tasks using differential rendering and synthetic generation, addressing multilingual document understanding for low-resource languages.
cs.CLcs.CVcs.LG