Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization
Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy
TL;DR
The paper investigates how memorization interacts with learning in stochastic convex optimization by quantifying information leakage about training data through conditional mutual information (CMI). It establishes a precise CMI–accuracy tradeoff for $\varepsilon$-learners, with $\Omega\left(1/\varepsilon^{2}\right)$ lower bounds in CLB SCOs and $\Omega\left(1/\varepsilon\right)$ in CSL SCOs, and proves that memorization is necessary via adversarial fingerprinting constructions. It also shows CMI-based generalization bounds can be vacuous for SCO with optimal sample complexity, and proves the nonexistence of constant-size sample compression schemes in SCO, with ISCMI offering no improvement. The results imply fundamental limits on information-theoretic approaches to generalization in SCO, highlight the essential role of memorization in SCO, and extend to both proper/improper and individual-sample CMI frameworks. Collectively, the work provides a rigorous information-theoretic view of memorization, learning, and generalization in SCO and clarifies the limits of existing CMI-based guarantees.
Abstract
In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\varepsilon$ has CMI bounded below by $Ω(1/\varepsilon^2)$ and $Ω(1/\varepsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.
