How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

Koren Abitbul; Yehuda Dar

How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

Koren Abitbul, Yehuda Dar

TL;DR

The paper tackles memorization in overparameterized autoencoders by recasting it as an inverse problem: recover a training sample $\mathbf{x}$ from a degraded version $\mathbf{y}=\mathbf{H}\mathbf{x}+\boldsymbol{\epsilon}$ using a regularizer $s_f$ induced by the trained autoencoder $f$, while the degradation operator $\mathbf{H}$ is unknown. It introduces a practical alternating-minimization algorithm that embeds a black-box autoencoder inside a plug-and-play-ADMM framework, jointly estimating $\mathbf{x}$ and $\mathbf{H}$ and employing an ADMM inner loop with a proximal-like step that can be replaced by $f$. The authors establish theoretical connections showing certain 2-layer tied autoencoders are Moreau proximal mappings, enabling the plug-and-play substitution, and demonstrate substantial empirical gains in training-data recovery over prior memorization-evaluation methods across multiple architectures and datasets, including moderate overfitting and large-scale regimes. The results offer a scalable, data-specific approach to quantify memorization in autoencoders, with implications for understanding overparameterization, privacy risks, and data-regression capabilities in deep representations. The method improves recoveries on training data while preserving lack of recovery on non-training data, underscoring its role as a precise diagnostic for memorization phenomena in deep autoencoders.

Abstract

Overparameterized autoencoder models often memorize their training data. For image data, memorization is often examined by using the trained autoencoder to recover missing regions in its training images (that were used only in their complete forms in the training). In this paper, we propose an inverse problem perspective for the study of memorization. Given a degraded training image, we define the recovery of the original training image as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such recovery and memorization evaluation.

How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

TL;DR

The paper tackles memorization in overparameterized autoencoders by recasting it as an inverse problem: recover a training sample

from a degraded version

using a regularizer

induced by the trained autoencoder

, while the degradation operator

is unknown. It introduces a practical alternating-minimization algorithm that embeds a black-box autoencoder inside a plug-and-play-ADMM framework, jointly estimating

and

and employing an ADMM inner loop with a proximal-like step that can be replaced by

. The authors establish theoretical connections showing certain 2-layer tied autoencoders are Moreau proximal mappings, enabling the plug-and-play substitution, and demonstrate substantial empirical gains in training-data recovery over prior memorization-evaluation methods across multiple architectures and datasets, including moderate overfitting and large-scale regimes. The results offer a scalable, data-specific approach to quantify memorization in autoencoders, with implications for understanding overparameterization, privacy risks, and data-regression capabilities in deep representations. The method improves recoveries on training data while preserving lack of recovery on non-training data, underscoring its role as a precise diagnostic for memorization phenomena in deep autoencoders.

Abstract

Paper Structure (30 sections, 5 theorems, 32 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 5 theorems, 32 equations, 11 figures, 2 tables, 1 algorithm.

Introduction
Recovery of Training Data as an Inverse Problem
Overparameterized Autoencoders: A Definition
The Recovery Problem
The Inverse Problem Perspective to Training Data Recovery
A Practical Algorithm for Training Data Recovery
Estimation of a Training Sample x for a Given Degradation Operator Estimate
The ADMM Form of the Recovery Optimization.
Addressing the Optimization in (\ref{['eq: algorithm development - method of multipliers - 2']}) and its Implicit Regularizer $s_f$.
Addressing the Pixel Erasure Degradation in (\ref{['eq: algorithm development - method of multipliers - 1']}).
Estimation of a degradation operator for a given training sample estimate.
Experimental Results
Practical definitions of successful recovery and evaluation metrics.
The evaluated recovery methods.
Recovery performance on training data.
...and 15 more sections

Key Result

theorem thmcountertheorem

Any $2$-layer tied autoencoder with is a Moreau proximity operator.

Figures (11)

Figure 1: Iterative recovery of a degraded training image using our proposed approach (top frame) and the method from previous works (bottom frame).
Figure 2: The estimation of $\widehat{\mathbf{H}}$ along the iterations. $\widehat{\mathbf{H}}^{(t)}$ is the estimate of $\mathbf{H}$ at the $t^{\rm th}$ iteration of the proposed recovery algorithm via alternating minimization (\ref{['eq: alternating minimization - optimize x for fixed H']})-(\ref{['eq: alternating minimization - optimize H for fixed x']}). Here, the diagonal matrix $\widehat{\mathbf{H}}^{(t)}$ is shown as an image.
Figure 3: Accurate recovery rates for recovery from degradation due to various missing pixel masks, tested on different architectures. The evaluated recovery methods are the proposed method for the unknown mask (orange curves), the proposed method for a known mask (blue curves), the simple iterations of the autoencoder only (green curves), and the generic inpainting method DDNM that got 0% accurate-recovery rate for all these settings and therefore is not graphically shown.
Figure C.1: Architecture of 10 layers and 20 layers fully connected autoencoders for the Tiny ImageNet dataset (a subset of images, at $64 \times 64 \times 3$ pixel size).
Figure C.2: Architecture of U-Net autoencoder for the CIFAR-10 and SVHN datasets (subsets of images, at $32 \times 32 \times 3$ pixel size). stride and padding are 1, kernel size is $3 \times 3$, and an activation function is applied after every convolution.
...and 6 more figures

Theorems & Definitions (10)

definition thmcounterdefinition
definition thmcounterdefinition
theorem thmcountertheorem
corollary thmcountercorollary
lemma A.2
proof
corollary A.2
lemma A.3
proof
proof

How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

TL;DR

Abstract

How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (10)