[arXiv]score: 0.35
AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers
May 14, 2026
AGOP-Weighted repurposes the Average Gradient Outer Product, a feature-learning quantity from the Neural Feature Ansatz, as a per-sample attribution method by weighting gradients with a training-distribution prior via sqrt(diag(M)/max diag(M)). This bridges feature learning theory and explainability for image classifiers. Practitioners gain a theoretically grounded saliency method without additional forward passes.
cs.LG