Chiaroscuro Attention Routes Tokens by Spectral Entropy, Hits PPL 36.54 on WikiText-103
June 5, 2026
CHIAR-Former routes each token to DCT spectral mixing or full self-attention based on per-token spectral entropy, achieving validation perplexity of 36.54 on WikiText-103 — a 45% improvement over a full-attention baseline. Ablations reveal routing collapse away from RBF kernels, leaving DCT and attention as complementary and sufficient operators.
HOW THIS AFFECTS YOU
●
researcherThe spectral entropy routing signal and the finding of routing collapse away from RBF are interesting architectural insights, though results are on a small 4-layer model on WikiText-103.