[arXiv]score: 0.19
Upper-Layer Prompt KV Cache Is Structural, Not Semantic — Safe to Replace After Few Decode Steps
June 1, 2026
Controlled splice experiments across Qwen3, Gemma 3, and Llama 3 show that upper-layer prompt KV cache entries encode chat template scaffolding rather than content semantics, and can be replaced with neutral-filler scaffold cache with near-zero accuracy loss. Zeroing the same slots collapses accuracy, confirming the cache carries structural form, not content.
cs.CL
HOW THIS AFFECTS YOU
●
builderYou can use this finding to justify more aggressive KV cache compression strategies in production inference systems without accuracy regression, specifically targeting upper layers after the first few decode steps.
●
researcherThe form-vs-content dissociation across three model families provides a mechanistic basis for principled KV cache pruning research.