[arXiv]score: 0.67

Hubness, Not Anisotropy, Causes Cross-Lingual Retrieval Asymmetry in 5 Production Embedders

May 27, 2026

Across five production multilingual encoders (Gemini, Mistral, OpenAI-L, OpenAI-S, Qwen) tested on 6,518 idiomatic expressions in four languages, hubness accounts for 49.5% of variance in retrieval reciprocity failure versus 0.3% for anisotropy, identifying the dominant geometric cause of cross-lingual asymmetry.

cs.CL

HOW THIS AFFECTS YOU

●

builderIf you're building cross-lingual retrieval systems with any of these five encoders, hub-aware re-ranking is the fix — not anisotropy correction or magnitude normalization.

●

researcherPre-registered falsification conditions and partial R² decomposition make this a methodologically rigorous causal claim about multilingual embedding geometry worth building on.

SOURCE

https://arxiv.org/abs/2605.26575

← back to feed