Gradient-based Epistemic Goggles to Mitigate Negation Neglect
July 3, 2026
Goggles is a learned module that intervenes on LoRA gradients during supervised finetuning to induce specific epistemic frames. It addresses Negation Neglect, where models fail to identify fictional claims as such, achieving better stance identification than traditional data annotation methods.
HOW THIS AFFECTS YOU
●
researcherYou can use this method to control how models perceive the truthfulness of training data without manual labeling.