[HUGGINGFACE]score: 0.42
Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings
May 26, 2026
A deterministic, training-free embedding compression method reduces 384-dim sentence vectors from 1536 bytes to 48 bytes (32x) using a sparse signed Johnson-Lindenstrauss projection followed by scalar quantization. No learned codebooks or corpus statistics are required, so new vectors can be indexed immediately. The method is evaluated on 9,304 multilingual sentence-similarity pairs across 29 subsets with a multilingual MiniLM encoder, with queries remaining in float32 at search time.
paper