[HUGGINGFACE]score: 0.48

LOCOS Scoring Identifies Non-Literal Retrieval Heads via OV-Circuit Projections

June 30, 2026

Logit-Contribution Scoring (LOCOS) identifies attention heads that synthesize information rather than copying text by measuring OV-circuit output projections onto the answer-token unembedding direction. This overcomes existing detectors that rely on token-matching and fail to capture non-literal retrieval mechanisms in long-context models.

HOW THIS AFFECTS YOU

●

researcherYou can now better interpret how long-context models perform semantic synthesis instead of simple pattern matching.

read original ↗huggingface.co

DAILY DIGEST

catch up on AI in 2 minutes, every morning. free. unsubscribe anytime. privacy

← back to feed