[arXiv]score: 0.15

Tapered LLM Architecture Improves Perplexity by Front-Loading Parameter Capacity

June 23, 2026

Allocating more parameters to earlier transformer layers and fewer to later ones improves perplexity under a fixed budget, while the reverse hurts. The Tapered Language Models (TLMs) principle applies monotonic capacity reduction across depth to MLPs and other parameter-bearing components. This challenges the default uniform-width architecture inherited from the original transformer.

HOW THIS AFFECTS YOU

●

researcherTLMs offer a drop-in architectural modification testable under fixed parameter budgets, with controlled experiments showing perplexity gains over uniform-width baselines.

read original ↗arxiv.org

← back to feed