Gzip as a Language Model: Compression Efficiency Scores Text Predictions
June 18, 2026
Text continuation can be scored by measuring how efficiently gzip compresses candidate outputs appended to a prompt, effectively using compression ratio as a probability proxy. The approach has no learned parameters and serves as a baseline for understanding what statistical patterns LLMs capture.
HOW THIS AFFECTS YOU
●
researcherThe compression-as-LM framing offers a parameter-free baseline useful for isolating what neural LLMs learn beyond raw statistical redundancy.