[HUGGINGFACE]score: 0.48
SemBridge Initializes Sparse Encoder Vocabulary Tokens via Multilingual Dense Embeddings
May 24, 2026
SemBridge adapts English-centric sparse retrieval encoders to non-English languages by initializing target-vocabulary token embeddings from a small set of semantically related source tokens identified via multilingual dense bridge models, filtering semantic noise during cross-lingual transfer.
paper
HOW THIS AFFECTS YOU
●
builderYou can use SemBridge to extend existing sparse retrieval systems (SPLADE-style) to non-English languages without retraining from scratch.
●
researcherThe vocabulary-level semantic alignment via dense bridge models is a novel initialization strategy that addresses the structural mismatch in cross-lingual sparse encoder adaptation.