[arXiv]score: 0.13

Keyword Lexicons Produce Spurious r=0.85 Correlations Fixed by LLM Scoring

June 25, 2026

Keyword-based sentiment scoring produces large, statistically significant but artifactual correlations — Dalio's negative-affect/certainty r drops from 0.851 to 0.206 when replaced with LLM zero-shot classification on 32,625 sentences. LLM scoring instead surfaces a genuine negative-hedging coupling missed entirely by lexicons.

HOW THIS AFFECTS YOU

●

researcherDirect warning for NLP and computational social science work: keyword lexicon baselines can generate large-effect spurious findings that LLM classifiers do not replicate.

read original ↗arxiv.org

← back to feed