[arXiv]score: 0.68

Influence Functions Detect Backdoors and Mechanistic Anomalies Across Modalities

May 26, 2026

Reframing mechanistic anomaly detection as a functional attribution problem using influence functions and parameter-space sampling achieves state-of-the-art backdoor detection on BackdoorBench, outperforming latent-space methods that are vulnerable to obfuscation.

cs.LGcs.CR

HOW THIS AFFECTS YOU

●

researcherThe influence-function attribution approach is architecture- and modality-agnostic, making it a more generalizable baseline for mechanistic anomaly detection than existing latent-space methods.

●

policyWorth watching because it provides an obfuscation-resistant method for detecting backdoors and abnormal internal model behavior, which is relevant to supply-chain security and model auditing requirements.

SOURCE

https://arxiv.org/abs/2604.18970

← back to feed