Table of Contents
Fetching ...

Unveiling Privacy, Memorization, and Input Curvature Links

Deepak Ravikumar, Efstathia Soufleri, Abolfazl Hashemi, Kaushik Roy

TL;DR

The paper tackles the interplay between memorization in deep neural networks, input loss curvature, and differential privacy. It introduces three theorems that connect memorization to curvature (mem_curv), curvature to privacy (pr_curv), and privacy to memorization (pr_mem), yielding bounds that show memorization is controlled by curvature and privacy, and that stronger privacy reduces curvature. The authors provide empirical validation on CIFAR100 and ImageNet, using Hutchinson's trace estimator for curvature and DP-SGD for privacy, demonstrating strong alignment with the theoretical predictions. The work suggests that input curvature can serve as a compute-efficient surrogate for memorization and offers practical guidance for privacy-preserving deep learning.

Abstract

Deep Neural Nets (DNNs) have become a pervasive tool for solving many emerging problems. However, they tend to overfit to and memorize the training set. Memorization is of keen interest since it is closely related to several concepts such as generalization, noisy learning, and privacy. To study memorization, Feldman (2019) proposed a formal score, however its computational requirements limit its practical use. Recent research has shown empirical evidence linking input loss curvature (measured by the trace of the loss Hessian w.r.t inputs) and memorization. It was shown to be ~3 orders of magnitude more efficient than calculating the memorization score. However, there is a lack of theoretical understanding linking memorization with input loss curvature. In this paper, we not only investigate this connection but also extend our analysis to establish theoretical links between differential privacy, memorization, and input loss curvature. First, we derive an upper bound on memorization characterized by both differential privacy and input loss curvature. Second, we present a novel insight showing that input loss curvature is upper-bounded by the differential privacy parameter. Our theoretical findings are further empirically validated using deep models on CIFAR and ImageNet datasets, showing a strong correlation between our theoretical predictions and results observed in practice.

Unveiling Privacy, Memorization, and Input Curvature Links

TL;DR

The paper tackles the interplay between memorization in deep neural networks, input loss curvature, and differential privacy. It introduces three theorems that connect memorization to curvature (mem_curv), curvature to privacy (pr_curv), and privacy to memorization (pr_mem), yielding bounds that show memorization is controlled by curvature and privacy, and that stronger privacy reduces curvature. The authors provide empirical validation on CIFAR100 and ImageNet, using Hutchinson's trace estimator for curvature and DP-SGD for privacy, demonstrating strong alignment with the theoretical predictions. The work suggests that input curvature can serve as a compute-efficient surrogate for memorization and offers practical guidance for privacy-preserving deep learning.

Abstract

Deep Neural Nets (DNNs) have become a pervasive tool for solving many emerging problems. However, they tend to overfit to and memorize the training set. Memorization is of keen interest since it is closely related to several concepts such as generalization, noisy learning, and privacy. To study memorization, Feldman (2019) proposed a formal score, however its computational requirements limit its practical use. Recent research has shown empirical evidence linking input loss curvature (measured by the trace of the loss Hessian w.r.t inputs) and memorization. It was shown to be ~3 orders of magnitude more efficient than calculating the memorization score. However, there is a lack of theoretical understanding linking memorization with input loss curvature. In this paper, we not only investigate this connection but also extend our analysis to establish theoretical links between differential privacy, memorization, and input loss curvature. First, we derive an upper bound on memorization characterized by both differential privacy and input loss curvature. Second, we present a novel insight showing that input loss curvature is upper-bounded by the differential privacy parameter. Our theoretical findings are further empirically validated using deep models on CIFAR and ImageNet datasets, showing a strong correlation between our theoretical predictions and results observed in practice.
Paper Structure (23 sections, 8 theorems, 67 equations, 6 figures)

This paper contains 23 sections, 8 theorems, 67 equations, 6 figures.

Key Result

Theorem 5.1

Let the assumptions of error stability as:stability, generalization as:gen, and uniform model bias as:model_bias hold and assume the $\upsilon$-adjacency of the dataset and that the loss is bounded such that $0 \leq \ell \leq L$. Then with probability at least $1- \delta$ it holds

Figures (6)

  • Figure 1: Our theoretical framework provides upper bounds in Theorems \ref{['th:mem_curv']}, \ref{['th:pr_curv']}, and \ref{['th:pr_mem']}. These are visualized as links between Differential Privacy, Memorization, and Input Loss Curvature.
  • Figure 2: Images from ImageNet ranked using input loss curvature. Input loss curvature was obtained using a single ResNet18 trained on ImageNet. Ten lowest curvature samples (left) and ten highest curvature samples (right) from the training set are visualized for 5 classes (each row is a class) from ImageNet. Low curvature samples are 'prototypical' of their class, while high curvature samples are rare, difficult, and more likely memorized instances.
  • Figure 3: Plot of memorization score vs. input loss curvature at the end of training for CIFAR100 (average over 1000 Small Inception models) and ImageNet (average over 100 ResNet50) datasets.
  • Figure 4: Plot of differential privacy vs loss bound for CIFAR100 trained with cross-entropy and the best fit curve (dashed).
  • Figure 5: Plot of privacy vs loss curvature for CIFAR10 and CIFAR100. The best-fit curve (dashed) is predicted by Theorem \ref{['th:pr_curv']}.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Theorem 5.1: Curvature Upper Bounds Memorization
  • Lemma 5.2: Privacy $\implies$ Stability
  • Theorem 5.3: Privacy $\implies$ Low Input Loss Curvature
  • Theorem 5.4: Privacy $\implies$ Less Memorization
  • Lemma 1.1
  • Lemma 1.2
  • Lemma 1.3
  • Lemma 1.4