Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias; Gintare Karolina Dziugaite; Mahdi Haghifam; Roi Livni; Daniel M. Roy

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy

TL;DR

The paper investigates how memorization interacts with learning in stochastic convex optimization by quantifying information leakage about training data through conditional mutual information (CMI). It establishes a precise CMI–accuracy tradeoff for $\varepsilon$-learners, with $\Omega\left(1/\varepsilon^{2}\right)$ lower bounds in CLB SCOs and $\Omega\left(1/\varepsilon\right)$ in CSL SCOs, and proves that memorization is necessary via adversarial fingerprinting constructions. It also shows CMI-based generalization bounds can be vacuous for SCO with optimal sample complexity, and proves the nonexistence of constant-size sample compression schemes in SCO, with ISCMI offering no improvement. The results imply fundamental limits on information-theoretic approaches to generalization in SCO, highlight the essential role of memorization in SCO, and extend to both proper/improper and individual-sample CMI frameworks. Collectively, the work provides a rigorous information-theoretic view of memorization, learning, and generalization in SCO and clarifies the limits of existing CMI-based guarantees.

Abstract

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\varepsilon$ has CMI bounded below by $Ω(1/\varepsilon^2)$ and $Ω(1/\varepsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

TL;DR

-learners, with

lower bounds in CLB SCOs and

in CSL SCOs, and proves that memorization is necessary via adversarial fingerprinting constructions. It also shows CMI-based generalization bounds can be vacuous for SCO with optimal sample complexity, and proves the nonexistence of constant-size sample compression schemes in SCO, with ISCMI offering no improvement. The results imply fundamental limits on information-theoretic approaches to generalization in SCO, highlight the essential role of memorization in SCO, and extend to both proper/improper and individual-sample CMI frameworks. Collectively, the work provides a rigorous information-theoretic view of memorization, learning, and generalization in SCO and clarifies the limits of existing CMI-based guarantees.

Abstract

Lipschitz--bounded setting and under strong convexity, every learner with an excess error

has CMI bounded below by

and

, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.

Paper Structure (63 sections, 34 theorems, 189 equations, 4 algorithms)

This paper contains 63 sections, 34 theorems, 189 equations, 4 algorithms.

Introduction
Contributions
Limitation of the CMI Generalization Bound for SCOs.
Necessity of Memorization.
Incompressibility of Samples in SCOs.
Individual-Sample variant of CMI.
Organization
Related Work
Information-Theoretic Measures of Generalization.
Memorization.
Fingerprinting Codes and Privacy Attacks.
Preliminaries
Notations
Background on Information Theory
Stochastic Convex Optimization (SCO)
...and 48 more sections

Key Result

Theorem 4.1

Let $\varepsilon_0 \in (0,1)$ be a universal constant. There exists a loss function $f(\cdot,z)$ that is $1$-Lipschitz, for every $z$ such that: For every $\varepsilon \leq \varepsilon_0$ and for every algorithm $\mathcal{A}=\{\mathcal{A}_n\}_{n \in \mathbb{N}}$ that $\varepsilon$-learns with sample

Theorems & Definitions (52)

Definition 3.1
Definition 3.2
Theorem 4.1: CMI-accuracy tradeoff
Theorem 4.2: CMI-accuracy tradeoff, strongly convex case
Definition 4.3: Recall Game for $i$-th example
Definition 4.4: Soundness and recall
Theorem 4.5: Memorization/membership inference attack
Theorem 4.6: Memorization/membership inference attack, strongly convex case
Theorem 5.1: haghifam2023limitations
Theorem 5.2: Non-optimality of CMI generalization bound in SCO
...and 42 more

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

TL;DR

Abstract

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (52)