[arXiv]score: 0.68
Influence Functions Detect Backdoors and Mechanistic Anomalies Across Modalities
May 26, 2026
Reframing mechanistic anomaly detection as a functional attribution problem using influence functions and parameter-space sampling achieves state-of-the-art backdoor detection on BackdoorBench, outperforming latent-space methods that are vulnerable to obfuscation.
cs.LGcs.CR
HOW THIS AFFECTS YOU
●
researcherThe influence-function attribution approach is architecture- and modality-agnostic, making it a more generalizable baseline for mechanistic anomaly detection than existing latent-space methods.
●
policyWorth watching because it provides an obfuscation-resistant method for detecting backdoors and abnormal internal model behavior, which is relevant to supply-chain security and model auditing requirements.