Head-Wise Representation Alignment for Multimodal LLMs
June 21, 2026
HeRA improves Multimodal LLMs by enforcing cross-modal alignment at the level of individual attention heads rather than fixed layers. The method preserves the topological structure of representations across modalities using a mutual K-nearest neighbor alignment metric.
HOW THIS AFFECTS YOU
●
researcherYou can improve multimodal alignment by regularizing the fine-grained topological structure of individual transformer heads.