HACKOBAR_item
[arXiv]score: 0.35

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

May 13, 2026
This paper shows that uniform intervention schedules imported from autoregressive models degrade discrete diffusion LM (DLM) output quality, especially under multi-attribute steering. Training sparse autoencoders on four DLMs (124M–8B params) reveals attributes commit at distinct denoising steps, enabling mechanistically informed, non-uniform intervention schedules. Directly relevant to anyone doing controlled generation with MDLM-style models.
cs.LGcs.AIcs.CL