[arXiv]score: 0.13

OPPO Reinforcement Learning Reduces Cross-Modal Hallucination in Emotion Models

June 25, 2026

OPPO applies a KL penalty specifically to modality-specific evidence tokens under masked inputs, suppressing cross-modal hallucination in multimodal emotion reasoning. An accompanying benchmark, MEP-Bench, separately measures utilization and faithfulness of visual, acoustic, and emotion cues. Results show improved grounding in multimodal reasoning trajectories over baseline omni-MLLMs.

HOW THIS AFFECTS YOU

●

researcherThe modality-masking KL penalty is a transferable technique for any multimodal RL training setup where cross-modal hallucination is a concern beyond emotion tasks.

read original ↗arxiv.org

← back to feed