LOCOS Scoring Identifies Non-Literal Retrieval Heads via OV-Circuit Projections
June 30, 2026
Logit-Contribution Scoring (LOCOS) identifies attention heads that synthesize information rather than copying text by measuring OV-circuit output projections onto the answer-token unembedding direction. This overcomes existing detectors that rely on token-matching and fail to capture non-literal retrieval mechanisms in long-context models.
HOW THIS AFFECTS YOU
●
researcherYou can now better interpret how long-context models perform semantic synthesis instead of simple pattern matching.