[arXiv]score: 0.81

Aligned LLMs Predict Norms, Not Human Behavior — 10:1 Gap vs Base Models

May 27, 2026

Across 120 base-aligned model pairs evaluated on 10,000+ real human decisions in strategic games, base models outperform aligned models at predicting actual human choices by nearly 10:1, while aligned models dominate on one-shot textbook games, indicating alignment instills normative rather than descriptive behavioral priors.

cs.CLcs.AIcs.GT

HOW THIS AFFECTS YOU

●

researcherYou need to account for this normative bias when using aligned LLMs as proxies for human behavior in simulation or evaluation pipelines.

●

policyWorth watching because alignment objectives may systematically diverge from modeling real human decision-making, with implications for how we interpret and audit model behavior in social contexts.

SOURCE

https://arxiv.org/abs/2603.17218

← back to feed