Table of Contents
Fetching ...

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy

TL;DR

The paper investigates how memorization interacts with learning in stochastic convex optimization by quantifying information leakage about training data through conditional mutual information (CMI). It establishes a precise CMI–accuracy tradeoff for $\varepsilon$-learners, with $\Omega\left(1/\varepsilon^{2}\right)$ lower bounds in CLB SCOs and $\Omega\left(1/\varepsilon\right)$ in CSL SCOs, and proves that memorization is necessary via adversarial fingerprinting constructions. It also shows CMI-based generalization bounds can be vacuous for SCO with optimal sample complexity, and proves the nonexistence of constant-size sample compression schemes in SCO, with ISCMI offering no improvement. The results imply fundamental limits on information-theoretic approaches to generalization in SCO, highlight the essential role of memorization in SCO, and extend to both proper/improper and individual-sample CMI frameworks. Collectively, the work provides a rigorous information-theoretic view of memorization, learning, and generalization in SCO and clarifies the limits of existing CMI-based guarantees.

Abstract

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\varepsilon$ has CMI bounded below by $Ω(1/\varepsilon^2)$ and $Ω(1/\varepsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

TL;DR

The paper investigates how memorization interacts with learning in stochastic convex optimization by quantifying information leakage about training data through conditional mutual information (CMI). It establishes a precise CMI–accuracy tradeoff for -learners, with lower bounds in CLB SCOs and in CSL SCOs, and proves that memorization is necessary via adversarial fingerprinting constructions. It also shows CMI-based generalization bounds can be vacuous for SCO with optimal sample complexity, and proves the nonexistence of constant-size sample compression schemes in SCO, with ISCMI offering no improvement. The results imply fundamental limits on information-theoretic approaches to generalization in SCO, highlight the essential role of memorization in SCO, and extend to both proper/improper and individual-sample CMI frameworks. Collectively, the work provides a rigorous information-theoretic view of memorization, learning, and generalization in SCO and clarifies the limits of existing CMI-based guarantees.

Abstract

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the Lipschitz--bounded setting and under strong convexity, every learner with an excess error has CMI bounded below by and , respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.
Paper Structure (63 sections, 34 theorems, 189 equations, 4 algorithms)

This paper contains 63 sections, 34 theorems, 189 equations, 4 algorithms.

Key Result

Theorem 4.1

Let $\varepsilon_0 \in (0,1)$ be a universal constant. There exists a loss function $f(\cdot,z)$ that is $1$-Lipschitz, for every $z$ such that: For every $\varepsilon \leq \varepsilon_0$ and for every algorithm $\mathcal{A}=\{\mathcal{A}_n\}_{n \in \mathbb{N}}$ that $\varepsilon$-learns with sample

Theorems & Definitions (52)

  • Definition 3.1
  • Definition 3.2
  • Theorem 4.1: CMI-accuracy tradeoff
  • Theorem 4.2: CMI-accuracy tradeoff, strongly convex case
  • Definition 4.3: Recall Game for $i$-th example
  • Definition 4.4: Soundness and recall
  • Theorem 4.5: Memorization/membership inference attack
  • Theorem 4.6: Memorization/membership inference attack, strongly convex case
  • Theorem 5.1: haghifam2023limitations
  • Theorem 5.2: Non-optimality of CMI generalization bound in SCO
  • ...and 42 more