●researcherMAR and the VETO benchmark provide a quantitative tool for diagnosing alignment overcorrection distinct from standard safety or bias metrics.
●policyMisfired alignment — where models reject evidence-backed conclusions — is a measurable failure mode that complicates safety evaluation and may affect high-stakes deployment contexts.