[arXiv]score: 0.41

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

May 14, 2026

Orthrus is a dual-architecture framework combining autoregressive LLM fidelity with diffusion-model parallel token generation, targeting the sequential decoding bottleneck. It claims exact generation parity with AR models while achieving diffusion-speed inference, addressing known degradation issues in standalone diffusion LMs. High-throughput inference engineers should evaluate this closely.

cs.LGcs.AI

SOURCE

https://arxiv.org/abs/2605.12825

← back to feed