[HUGGINGFACE]score: 0.55
Post-Conclusion CoT Continuation Degrades SFT Outcomes Even When Answers Are Correct
May 27, 2026
In answer-correct long-CoT training traces, reasoning that continues after a sufficiently supported answer acts as harmful supervision; removing these post-conclusion suffixes with a delete-only editor improves SFT outcomes without altering the answer. The finding suggests CoT data curation should audit trace endings, not just answer correctness.
paper
HOW THIS AFFECTS YOU
●
builderYou should audit long-CoT training data for post-conclusion continuation before fine-tuning reasoning models, as answer-correct traces with trailing reasoning can silently degrade model quality.
●
researcherThis identifies a concrete, previously overlooked data quality issue in long-CoT SFT that is orthogonal to answer correctness filtering, warranting re-examination of existing reasoning training pipelines.