Table of Contents
Fetching ...

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

Aditya Golatkar, Alessandro Achille, Stefano Soatto

TL;DR

This work tackles data forgetting in deep networks by focusing on the activations rather than the weights and introducing a one-shot, NTK-inspired scrubbing procedure. It derives information-theoretic bounds for information that can be extracted under both white-box and black-box access, and shows that the black-box bound scales favorably with the number of queries in over-parameterized models. The proposed NTK-based scrubbing moves the model toward the retentive reference $w(D_r)$ and adds carefully calibrated noise to destroy information about the forgotten cohort, achieving better performance across readouts than prior methods. Experiments on CIFAR-10 and Lacuna-10 demonstrate improved forgetting metrics, reduced membership leakage, and a favorable error-forgetting trade-off, though computational scalability remains a challenge for large-scale deployment.

Abstract

We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

TL;DR

This work tackles data forgetting in deep networks by focusing on the activations rather than the weights and introducing a one-shot, NTK-inspired scrubbing procedure. It derives information-theoretic bounds for information that can be extracted under both white-box and black-box access, and shows that the black-box bound scales favorably with the number of queries in over-parameterized models. The proposed NTK-based scrubbing moves the model toward the retentive reference and adds carefully calibrated noise to destroy information about the forgotten cohort, achieving better performance across readouts than prior methods. Experiments on CIFAR-10 and Lacuna-10 demonstrate improved forgetting metrics, reduced membership leakage, and a favorable error-forgetting trade-off, though computational scalability remains a challenge for large-scale deployment.

Abstract

We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.

Paper Structure

This paper contains 38 sections, 4 theorems, 31 equations, 8 figures.

Key Result

lemma thmcounterlemma

We have the following upper bound: where $p(f_{S(w)}(\mathbf{x}) | {\mathcal{D}}={\mathcal{D}_f} \cup {\mathcal{D}_r})$ is the distribution of activations after training on the complete dataset ${\mathcal{D}_f} \sqcup {\mathcal{D}_r}$ and scrubbing. Similarly, $p(f_{S_0(w)}(\mathbf{x}) | {\mathcal{D}}={\mathcal{D}_r})$ is the distrib

Figures (8)

  • Figure 1: Scrubbing procedure: PCA-projection of training paths on ${\mathcal{D}}$ (blue), ${\mathcal{D}_r}$ (orange) and the weights after scrubbing, using (Left) The Fisher method of golatkar2019eternal, and (Right) the proposed scrubbing method. Our proposed scrubbing procedure (red cross) moves the model towards $w({\mathcal{D}_r})$, which reduces the amount of noise (point cloud) that needs to be added to achieve forgetting.
  • Figure 2: Comparison of different models baselines (original, finetune) and forgetting methods (Fisher golatkar2019eternal and our NTK proposed method), using several readout functions ((Top) CIFAR and (Bottom) Lacuna). We benchmark them against a model that has never seen the data (the gold reference for forgetting): values (mean and standard deviation) measured from this models corresponds to the green region. Optimal scrubbing procedure should lie in the green region, or they will leak information about ${\mathcal{D}_f}$. We compute three read-out functions: (a) Error on forget set ${\mathcal{D}_f}$, (b) Error on retain set ${\mathcal{D}_r}$, (c) Error on test set $\mathcal{D}_\text{test}$. (d) Black-box membership inference attack: We construct a simple yet effective membership attack using the entropy of the output probabilities. We measures how often the attack model (using the activations of the scrubbed network) classify a sample belonging ${\mathcal{D}_f}$ as a training sample rather than being fooled by the scrubbing. (e) Re-learn time for different scrubbing methods: How fast a scrubbed model learns the forgotten cohort when fine-tuned on the complete dataset. We measure the re-learn time as the first epoch when the loss on ${\mathcal{D}_f}$ goes below a certain threshold.
  • Figure 3: Error-forgetting trade-off Using the proposed scrubbing procedure, by changing the variance of the noise, we can reduce the remaining information in the weights (white-box bound, left) and activations (black-box bound, center). However, it comes at the cost of increasing the test error. Notice that the bound on activation is much sharper than the bound on error at the same accuracy. (Right) Different samples leak different information. An attacker querying samples from ${\mathcal{D}_f}$ can gain much more information than querying unrelated images. This suggest that adversarial samples may be created to leak even more information.
  • Figure 4: Scrubbing brings activations closer to the target. We plot the $L_1$ norm of the difference between the final activations (post-softmax) of the target model trained only on ${\mathcal{D}_r}$, and models sampled along the line joining the original model $w(D)$ ($\alpha=0$) and the proposed scrubbed model ($\alpha=1$). The distance between the activations decreases as we move along the scrubbing direction. The $L_1$ distance is already low on the retain set (${\mathcal{D}_r}$) (red) as it corresponds to the data common to $w({\mathcal{D}})$ and $w({\mathcal{D}_r})$. However, the two models differ on the forget set (${\mathcal{D}_f}$) (blue) and we observe that the $L_1$ distance decreases as move along the proposed scrubbing direction.
  • Figure 5: Isosceles Trapezium Trick: $\|w({\mathcal{D}_r})-w({\mathcal{D}})\|=\|w_\text{lin}({\mathcal{D}_r})-w_\text{lin}({\mathcal{D}})\|+2\sin \alpha \|w_{lin}({\mathcal{D}})-w({\mathcal{D}})\|$. This allows us to match outputs of the linear dynamic model with the real output, without having to match the effective learning rate of the two, and while being more robust to wrong estimation of the curvature by the linearized model.
  • ...and 3 more figures

Theorems & Definitions (4)

  • lemma thmcounterlemma: Computable bound on mutual information
  • lemma thmcounterlemma
  • proposition thmcounterproposition
  • proposition thmcounterproposition