[arXiv]score: 0.13
HybridCodec: Modeling Discrete and Continuous Representations for Efficient Speech Language Models
June 29, 2026
HybridCodec Combines Discrete and Continuous Representations for Efficient Speech Language Models
Researchers propose a novel approach to improve the performance of multimodal text-audio systems by combining temporally compressed discrete tokens with dimensionality-reduced continuous residuals. The HybridCodec framework uses a hybridized discrete-continuous focal modulation codec and a hybrid Transformer, achieving significant improvements in speaker characteristic retention and reducing the number of autoregressive steps required.