OPPO Reinforcement Learning Reduces Cross-Modal Hallucination in Emotion Models
June 25, 2026
OPPO applies a KL penalty specifically to modality-specific evidence tokens under masked inputs, suppressing cross-modal hallucination in multimodal emotion reasoning. An accompanying benchmark, MEP-Bench, separately measures utilization and faithfulness of visual, acoustic, and emotion cues. Results show improved grounding in multimodal reasoning trajectories over baseline omni-MLLMs.
HOW THIS AFFECTS YOU
●
researcherThe modality-masking KL penalty is a transferable technique for any multimodal RL training setup where cross-modal hallucination is a concern beyond emotion tasks.