Formulaic Expression Desensitization Boosts Scientific Sentence Classification on Small Datasets
June 26, 2026
Formulaic expression desensitization augments small scientific paper datasets by abstracting domain-specific expressions to reduce model overfitting to surface forms, improving generalization for problem and method sentence extraction. The approach combines synthetic data generation with context enrichment to compensate for limited labeled data.
HOW THIS AFFECTS YOU
●
researcherThe desensitization-based augmentation technique is transferable to other low-resource scientific NLP tasks where models overfit to domain-specific phrasing.