Don't Break the Boundary: Continual Unlearning for OOD Detection Based on Free Energy Repulsion
Ningkang Peng, Kun Shao, Jingyang Mao, Linjing Qian, Xiaoqian Peng, Xichen Yang, Yanhui Gu
TL;DR
This work addresses the key problem of boundary-preserving unlearning for OOD detection, where traditional classification-focused unlearning distorts the ID manifold and degrades anomaly discrimination. It reframes forgetting as transforming the target class into an OOD-like state, and introduces TFER, a Push-Pull framework that uses a Total Free Energy Repulsion objective along with a Pull mechanism to anchor retained prototypes, all implemented with low-rank LoRA adapters for efficiency. Theoretical analysis demonstrates gradient stability via convex geometry, ensuring updates stay within the convex hull of retained gradients and avoiding abrupt manifold collapse. Empirical results on CIFAR-100 show TFER achieves strong Forgetting efficacy, high utility preservation, and robust OOD generalization, along with substantial efficiency gains and a scalable continual unlearning strategy based on modular orthogonality. Overall, the approach offers a practical path to privacy-compliant and correctable open-world systems that maintain reliable OOD detection capabilities.
Abstract
Deploying trustworthy AI in open-world environments faces a dual challenge: the necessity for robust Out-of-Distribution (OOD) detection to ensure system safety, and the demand for flexible machine unlearning to satisfy privacy compliance and model rectification. However, this objective encounters a fundamental geometric contradiction: current OOD detectors rely on a static and compact data manifold, whereas traditional classification-oriented unlearning methods disrupt this delicate structure, leading to a catastrophic loss of the model's capability to discriminate anomalies while erasing target classes. To resolve this dilemma, we first define the problem of boundary-preserving class unlearning and propose a pivotal conceptual shift: in the context of OOD detection, effective unlearning is mathematically equivalent to transforming the target class into OOD samples. Based on this, we propose the TFER (Total Free Energy Repulsion) framework. Inspired by the free energy principle, TFER constructs a novel Push-Pull game mechanism: it anchors retained classes within a low-energy ID manifold through a pull mechanism, while actively expelling forgotten classes to high-energy OOD regions using a free energy repulsion force. This approach is implemented via parameter-efficient fine-tuning, circumventing the prohibitive cost of full retraining. Extensive experiments demonstrate that TFER achieves precise unlearning while maximally preserving the model's discriminative performance on remaining classes and external OOD data. More importantly, our study reveals that the unique Push-Pull equilibrium of TFER endows the model with inherent structural stability, allowing it to effectively resist catastrophic forgetting without complex additional constraints, thereby demonstrating exceptional potential in continual unlearning tasks.
