Table of Contents
Fetching ...

Machine Unlearning: Linear Filtration for Logit-based Classifiers

Thomas Baumhauer, Pascal Schöttle, Matthias Zeppelzauer

TL;DR

This work tackles the data-deletion challenge posed by privacy regulations for ML by focusing on class-wide deletion in logit-based classifiers and introducing linear filtration as a fast, weak unlearning method that can be absorbed into the final layer. The proposed approach constructs a filtration that linearly transforms the classifier’s weight matrix to erase the influence of a deleted class while preserving performance on remaining classes. The authors formalize weak unlearning, evaluate it adversarially using a binary attacker on pre-softmax outputs, and demonstrate that normalization filtration significantly reduces leakage as evidenced by improved indistinguishability of seen vs not-seen distributions and mitigates model-inversion reconstructions for the deleted class. While promising, the method remains a shallow, black-box technique with potential for deeper integration and stronger guarantees, offering a practical complement to existing unlearning frameworks in scenarios where class ownership and post-hoc sanitization are required.

Abstract

Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a "right to be forgotten". This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which could be broadly described as the investigation of how to "delete training data from models". Our work complements this direction of research for the specific setting of class-wide deletion requests for classification models (e.g. deep neural networks). As a first step, we propose linear filtration as a intuitive, computationally efficient sanitization method. Our experiments demonstrate benefits in an adversarial setting over naive deletion schemes.

Machine Unlearning: Linear Filtration for Logit-based Classifiers

TL;DR

This work tackles the data-deletion challenge posed by privacy regulations for ML by focusing on class-wide deletion in logit-based classifiers and introducing linear filtration as a fast, weak unlearning method that can be absorbed into the final layer. The proposed approach constructs a filtration that linearly transforms the classifier’s weight matrix to erase the influence of a deleted class while preserving performance on remaining classes. The authors formalize weak unlearning, evaluate it adversarially using a binary attacker on pre-softmax outputs, and demonstrate that normalization filtration significantly reduces leakage as evidenced by improved indistinguishability of seen vs not-seen distributions and mitigates model-inversion reconstructions for the deleted class. While promising, the method remains a shallow, black-box technique with potential for deeper integration and stronger guarantees, offering a practical complement to existing unlearning frameworks in scenarios where class ownership and post-hoc sanitization are required.

Abstract

Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a "right to be forgotten". This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which could be broadly described as the investigation of how to "delete training data from models". Our work complements this direction of research for the specific setting of class-wide deletion requests for classification models (e.g. deep neural networks). As a first step, we propose linear filtration as a intuitive, computationally efficient sanitization method. Our experiments demonstrate benefits in an adversarial setting over naive deletion schemes.

Paper Structure

This paper contains 17 sections, 19 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Results of a model inversion attack for a toy model trained on the AT&T Faces dataset with 4 classes. For one of the classes, from left to right: one of the training images, reconstruction of the class by model inversion, reconstruction after naive unlearning, reconstruction after unlearning by our proposed method of normalizing linear filtration (defined in section \ref{['sec:method']}). The reconstructions of the other classes remain visually unchanged by normalizing linear filtration, see figure \ref{['fig:att_inversion2']}.
  • Figure 2: Schematic representation of $\sigma \circ h$, for a classifier $h~=~W \circ f$ in the hypothesis class considered throughout this paper. Here $f$ denotes a feature extraction, $W$ is a linear transformation and $\sigma$ is the softmax function. In deep learning terminology "Logits" represents a fully connected layer with $k$ units and weights $W$.
  • Figure 3: The probability distribution predicted for class "airplane" after its unlearning by either normalization or randomization from models trained on CIFAR-10, compared to models retrained without class "airplane". The bars are centered around the mean and have length of the standard deviation, over $100$ models.
  • Figure 4: Experimental setup: On the full training data we train $100$ models by $\mathbf A\ =\ $train(). To these models we then apply an unlearning operation $\mathfrak D\ =\ $unlearn(). We then predict() our test data for each of these models and label these predictions "seen". On the training data with $\mathcal{C}$ removed we train $100$ models by $\mathbf A_{\lnot \mathcal{C}}\ =\ $train(). We then predict() our test data for each of these models and label these predictions "not seen". Finally, we use all labeled predictions as the training/test data of a binary classifier $b$, which we employ as our "attack model". We interpret low test accuracy of $b$ as evidence for good performance of a weak unlearning operation.
  • Figure 5: Model inversion for a toy model trained on the AT&T Faces dataset with 4 classes. The top row shows one training image of each class, the second row reconstructions of classes by model inversion, the third row reconstructions after naive unlearning of the class in the first column, the bottom row reconstructions after unlearning the class in the first column by normalizing filtration. See figure \ref{['fig:att_inversion_full']} for the remaining classes.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 3.1: Unlearning
  • Definition 3.2: Weak unlearning