[arXiv]score: 0.15

SCPO Reward Model Training Boosts Minority Culture Preferences by 7 Points

June 18, 2026

Steerable Cultural Preference Optimization (SCPO) trains reward models to balance diverse cultural preferences without over-indexing on majority annotator groups. Evaluated on PRISM and GlobalOpinionQA across 7 countries, it improves minority reward model performance by up to 7 points and is up to 280% more training-data-efficient than the baseline.

HOW THIS AFFECTS YOU

●

researcherThe 280% data efficiency gain and cross-country evaluation on two datasets makes SCPO a strong candidate to benchmark against standard RLHF reward model training for culturally diverse deployments.

●

policyThis changes how alignment teams can justify global deployment — SCPO provides a measurable method for reducing cultural bias in reward models without requiring proportional data from each region.

read original ↗arxiv.org

← back to feed