Table of Contents
Fetching ...

Machine Unlearning using Forgetting Neural Networks

Amartya Hatua, Trung T. Nguyen, Filip Cano, Andrew H. Sung

TL;DR

This paper introduces Forgetting Neural Networks (FNNs) as a neuroscience-inspired approach to machine unlearning, embedding explicit forgetting through per-neuron forgetting factors and targeted attenuation of activations and weights. By dividing data into retain and forget sets, the method applies an iterative learn/unlearn protocol, guided by activation-based rankings, to erase information about the forget set while preserving utility on the retain set. Empirical results on MNIST and Fashion-MNIST show that rank-based varying forgetting rates can approach retraining performance in retained accuracy while minimizing membership inference leakage, though the approach can exhibit over-forgetting under aggressive settings. Overall, FNNs offer an interpretable, efficient framework for targeted unlearning with practical privacy benefits and a clear connection to cognitive forgetting principles.

Abstract

Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is sometimes desired for an ML model to forget part of the data it was trained on. In this paper, we introduce a novel unlearning approach based on Forgetting Neural Networks (FNNs), a neuroscience-inspired architecture that explicitly encodes forgetting through multiplicative decay factors. While FNNs had previously been studied as a theoretical construct, we provide the first concrete implementation and demonstrate their effectiveness for targeted unlearning. We propose several variants with per-neuron forgetting factors, including rank-based assignments guided by activation levels, and evaluate them on MNIST and Fashion-MNIST benchmarks. Our method systematically removes information associated with forget sets while preserving performance on retained data. Membership inference attacks confirm the effectiveness of FNN-based unlearning in erasing information about the training data from the neural network. These results establish FNNs as a promising foundation for efficient and interpretable unlearning.

Machine Unlearning using Forgetting Neural Networks

TL;DR

This paper introduces Forgetting Neural Networks (FNNs) as a neuroscience-inspired approach to machine unlearning, embedding explicit forgetting through per-neuron forgetting factors and targeted attenuation of activations and weights. By dividing data into retain and forget sets, the method applies an iterative learn/unlearn protocol, guided by activation-based rankings, to erase information about the forget set while preserving utility on the retain set. Empirical results on MNIST and Fashion-MNIST show that rank-based varying forgetting rates can approach retraining performance in retained accuracy while minimizing membership inference leakage, though the approach can exhibit over-forgetting under aggressive settings. Overall, FNNs offer an interpretable, efficient framework for targeted unlearning with practical privacy benefits and a clear connection to cognitive forgetting principles.

Abstract

Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is sometimes desired for an ML model to forget part of the data it was trained on. In this paper, we introduce a novel unlearning approach based on Forgetting Neural Networks (FNNs), a neuroscience-inspired architecture that explicitly encodes forgetting through multiplicative decay factors. While FNNs had previously been studied as a theoretical construct, we provide the first concrete implementation and demonstrate their effectiveness for targeted unlearning. We propose several variants with per-neuron forgetting factors, including rank-based assignments guided by activation levels, and evaluate them on MNIST and Fashion-MNIST benchmarks. Our method systematically removes information associated with forget sets while preserving performance on retained data. Membership inference attacks confirm the effectiveness of FNN-based unlearning in erasing information about the training data from the neural network. These results establish FNNs as a promising foundation for efficient and interpretable unlearning.

Paper Structure

This paper contains 38 sections, 3 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Diagram of the implemented model. Built with the help of FNN_Diagrams.
  • Figure 2: Learning-unlearning curve for FFR FNN
  • Figure 3: Learning-unlearning curve for VFR FNN on the MNIST HDR dataset.
  • Figure 4: Learning-unlearning curves for VFR FNN on the MNIST Fashion dataset.
  • Figure 5: Learning-unlearning curve & MIA score for fixed forgetting rate networks.
  • ...and 3 more figures