[HUGGINGFACE]score: 0.42

GradSentry: Gradient Spectral Entropy for Backdoor Sample Filtering in Large Language Model Fine-Tuning

May 25, 2026

Poisoned samples produce higher spectral entropy in their per-sample gradients than clean samples — GradSentry exploits this signal to filter backdoor data during LLM fine-tuning without clustering or pairwise comparisons. The method is training-agnostic, supporting both full fine-tuning and parameter-efficient approaches like LoRA, and is designed to remain effective at extreme poison ratios where clustering-based defenses break down.

paper

SOURCE

https://huggingface.co/papers/2605.26574

← back to feed