●builderIf you're fine-tuning models with RL, reward model oversensitivity may be silently degrading output quality — this paper gives you diagnostic tools to check.
●researcherMonte Carlo dropout-based reward clustering is a concrete, implementable mitigation for reward hacking — the discriminative ability metric is worth adopting in RLHF pipelines.