[arXiv]score: 0.38
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs
May 13, 2026
Linear additivity in VLM embeddings (CLIP, SigLIP 2) enables decomposition of scene representations into foreground and background components to mitigate spurious background-object correlations.
cs.CVcs.AI