Training Data Reconstruction: Privacy due to Uncertainty?
Christina Runkel, Kanchana Vaishnavi Gandikota, Jonas Geiping, Carola-Bibiane Schönlieb, Michael Moeller
TL;DR
This work studies the privacy risks of reconstructing training data from neural network parameters by casting the task as a bilevel optimization: minimize $l(\theta^*, \theta(x,y))$ subject to $\theta(x,y)$ solving the lower-level training problem. It demonstrates that reconstruction outcomes depend critically on how the input $x$ is initialised, with random initialisations producing non-ground-truth, plausible-looking images and near-ground-truth initialisations capable of recovering actual samples, though membership cannot be certified with certainty. The authors compare a bilevel approach to the DecoReco method, finding similar initialization sensitivity and showing that both formulations exhibit an energy landscape with many local minima. Collectively, the results highlight a nuanced privacy implication: even able reconstructions may not reliably reveal whether a given image was part of the training data, complicating membership claims in practical settings.
Abstract
Being able to reconstruct training data from the parameters of a neural network is a major privacy concern. Previous works have shown that reconstructing training data, under certain circumstances, is possible. In this work, we analyse such reconstructions empirically and propose a new formulation of the reconstruction as a solution to a bilevel optimisation problem. We demonstrate that our formulation as well as previous approaches highly depend on the initialisation of the training images $x$ to reconstruct. In particular, we show that a random initialisation of $x$ can lead to reconstructions that resemble valid training samples while not being part of the actual training dataset. Thus, our experiments on affine and one-hidden layer networks suggest that when reconstructing natural images, yet an adversary cannot identify whether reconstructed images have indeed been part of the set of training samples.
