[arXiv]score: 0.15

ReRULE Off-Policy Replay Improves LLM Unlearning Efficiency via Hard-Case Reuse

June 16, 2026

ReRULE augments GRPO-based reinforcement unlearning by storing low-reward hard-case rollouts in a replay buffer and reusing them via importance-sampled off-policy updates, targeting the boundary between forget and retain sets. The approach addresses the gradient starvation problem where easy cases converge quickly and hard cases are discarded after a single use.

HOW THIS AFFECTS YOU

●

researcherThe replay buffer framing of unlearning efficiency is a concrete algorithmic contribution to RL-based unlearning that can be layered onto existing GRPO pipelines.

●

policyMore efficient unlearning methods reduce the cost of hazardous knowledge removal, making compliance-driven model remediation more operationally feasible.

read original ↗arxiv.org

← back to feed