[HUGGINGFACE]score: 0.55

Post-Conclusion CoT Continuation Degrades SFT Outcomes Even When Answers Are Correct

May 27, 2026

In answer-correct long-CoT training traces, reasoning that continues after a sufficiently supported answer acts as harmful supervision; removing these post-conclusion suffixes with a delete-only editor improves SFT outcomes without altering the answer. The finding suggests CoT data curation should audit trace endings, not just answer correctness.

paper

HOW THIS AFFECTS YOU

●

builderYou should audit long-CoT training data for post-conclusion continuation before fine-tuning reasoning models, as answer-correct traces with trailing reasoning can silently degrade model quality.

●

researcherThis identifies a concrete, previously overlooked data quality issue in long-CoT SFT that is orthogonal to answer correctness filtering, warranting re-examination of existing reasoning training pipelines.

SOURCE

https://huggingface.co/papers/2605.29288

← back to feed