Majority Voting Flaws in Hate Speech Annotation Pipelines
June 30, 2026
Research shows that collapsing annotator disagreement into majority votes in HateXplain hides critical errors at the hate/offensive boundary, where model accuracy drops from 80% to 58%.
HOW THIS AFFECTS YOU
●
researcherYou should use soft-label models or per-annotator multi-head models to capture nuances in disagreement.
●
policyStandard evaluation metrics may fail to detect significant failures in safety-critical moderation models.