Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations
Aditya Golatkar, Alessandro Achille, Stefano Soatto
TL;DR
This work tackles data forgetting in deep networks by focusing on the activations rather than the weights and introducing a one-shot, NTK-inspired scrubbing procedure. It derives information-theoretic bounds for information that can be extracted under both white-box and black-box access, and shows that the black-box bound scales favorably with the number of queries in over-parameterized models. The proposed NTK-based scrubbing moves the model toward the retentive reference $w(D_r)$ and adds carefully calibrated noise to destroy information about the forgotten cohort, achieving better performance across readouts than prior methods. Experiments on CIFAR-10 and Lacuna-10 demonstrate improved forgetting metrics, reduced membership leakage, and a favorable error-forgetting trade-off, though computational scalability remains a challenge for large-scale deployment.
Abstract
We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.
