[HUGGINGFACE]score: 0.42
GradSentry: Gradient Spectral Entropy for Backdoor Sample Filtering in Large Language Model Fine-Tuning
May 25, 2026
Poisoned samples produce higher spectral entropy in their per-sample gradients than clean samples — GradSentry exploits this signal to filter backdoor data during LLM fine-tuning without clustering or pairwise comparisons. The method is training-agnostic, supporting both full fine-tuning and parameter-efficient approaches like LoRA, and is designed to remain effective at extreme poison ratios where clustering-based defenses break down.
paper