Keyword Lexicons Produce Spurious r=0.85 Correlations Fixed by LLM Scoring
June 25, 2026
Keyword-based sentiment scoring produces large, statistically significant but artifactual correlations — Dalio's negative-affect/certainty r drops from 0.851 to 0.206 when replaced with LLM zero-shot classification on 32,625 sentences. LLM scoring instead surfaces a genuine negative-hedging coupling missed entirely by lexicons.
HOW THIS AFFECTS YOU
●
researcherDirect warning for NLP and computational social science work: keyword lexicon baselines can generate large-effect spurious findings that LLM classifiers do not replicate.