[arXiv]score: 0.41
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
May 14, 2026
Orthrus is a dual-architecture framework combining autoregressive LLM fidelity with diffusion-model parallel token generation, targeting the sequential decoding bottleneck. It claims exact generation parity with AR models while achieving diffusion-speed inference, addressing known degradation issues in standalone diffusion LMs. High-throughput inference engineers should evaluate this closely.
cs.LGcs.AI