HACKOBAR_item
[arXiv]score: 0.24

Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence

April 30, 2026
UniMatrix introduces a Universal Transformer family using shared recurrent blocks with ROSA-style residual paths and token-conditioned embedding modulation, evaluated at byte-level on WikiText-2 and synthetic associative recall tasks. UniMatrix-Core achieves 5.084 bits-per-byte versus a parameter-matched Transformer's 5.124, with significantly fewer parameters, but critically fails associative recall at near-chance accuracy while the Transformer hits 25.4%. This exposes a fundamental tension: compressed recurrent states gain parameter efficiency but lose exact key-value retrieval capability essential for in-context learning. Researchers building efficient sequence models must weigh this tradeoff carefully before replacing attention with structured recurrence.
cs.CLcs.LG