[HUGGINGFACE]score: 0.42
MERIT Disentangles Music Similarity Into Melody, Rhythm, and Timbre Heads
May 25, 2026
MERIT trains separate representation heads for melody, rhythm, and timbre using conditional audio generation and source-separated stems to enforce single-factor variation in training data. Each head responds strongly to its target dimension while performing near chance on the others, enabling factor-specific similarity queries.
paper
HOW THIS AFFECTS YOU
●
builderYou can use MERIT's disentangled heads to power nuanced music search or recommendation features that let users query by specific perceptual dimensions rather than a single similarity score.
●
researcherThe training strategy of using conditional generation to synthesize single-factor variation is a transferable technique for disentanglement in other audio or multimodal domains.