[arXiv]score: 0.58
Autoregressive Probabilistic Transformers Have Strictly Greater Expressivity Than Deterministic Recognizers
May 26, 2026
Formal characterization shows that making transformer language recognizers autoregressive can increase expressivity, and probabilistic generation breaks equivalences that hold in the deterministic case, with implications for understanding what distributions transformer LMs can represent.
cs.CL
HOW THIS AFFECTS YOU
●
researcherThis theoretical result directly challenges assumptions underlying expressivity comparisons between transformer variants and has implications for understanding LLM capabilities and limitations.