Machine Unlearning using Forgetting Neural Networks
Amartya Hatua, Trung T. Nguyen, Filip Cano, Andrew H. Sung
TL;DR
This paper introduces Forgetting Neural Networks (FNNs) as a neuroscience-inspired approach to machine unlearning, embedding explicit forgetting through per-neuron forgetting factors and targeted attenuation of activations and weights. By dividing data into retain and forget sets, the method applies an iterative learn/unlearn protocol, guided by activation-based rankings, to erase information about the forget set while preserving utility on the retain set. Empirical results on MNIST and Fashion-MNIST show that rank-based varying forgetting rates can approach retraining performance in retained accuracy while minimizing membership inference leakage, though the approach can exhibit over-forgetting under aggressive settings. Overall, FNNs offer an interpretable, efficient framework for targeted unlearning with practical privacy benefits and a clear connection to cognitive forgetting principles.
Abstract
Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is sometimes desired for an ML model to forget part of the data it was trained on. In this paper, we introduce a novel unlearning approach based on Forgetting Neural Networks (FNNs), a neuroscience-inspired architecture that explicitly encodes forgetting through multiplicative decay factors. While FNNs had previously been studied as a theoretical construct, we provide the first concrete implementation and demonstrate their effectiveness for targeted unlearning. We propose several variants with per-neuron forgetting factors, including rank-based assignments guided by activation levels, and evaluate them on MNIST and Fashion-MNIST benchmarks. Our method systematically removes information associated with forget sets while preserving performance on retained data. Membership inference attacks confirm the effectiveness of FNN-based unlearning in erasing information about the training data from the neural network. These results establish FNNs as a promising foundation for efficient and interpretable unlearning.
