[arXiv]score: 0.67
Hubness, Not Anisotropy, Causes Cross-Lingual Retrieval Asymmetry in 5 Production Embedders
May 27, 2026
Across five production multilingual encoders (Gemini, Mistral, OpenAI-L, OpenAI-S, Qwen) tested on 6,518 idiomatic expressions in four languages, hubness accounts for 49.5% of variance in retrieval reciprocity failure versus 0.3% for anisotropy, identifying the dominant geometric cause of cross-lingual asymmetry.
cs.CL
HOW THIS AFFECTS YOU
●
builderIf you're building cross-lingual retrieval systems with any of these five encoders, hub-aware re-ranking is the fix — not anisotropy correction or magnitude normalization.
●
researcherPre-registered falsification conditions and partial R² decomposition make this a methodologically rigorous causal claim about multilingual embedding geometry worth building on.