●builderOpen-source 8B diffusion reasoning model with parallel decoding is worth evaluating if you need faster inference on long chain-of-thought tasks compared to autoregressive alternatives.
●researcherThe block-size curriculum finding — that large block training degrades CoT reasoning while small blocks preserve it — is a concrete, reproducible insight for anyone working on discrete diffusion LMs.