Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry
Naoki Yoshida, Isao Ishikawa, Masaaki Imaizumi
TL;DR
This work provides a model-based theory for zero generalization error of interpolators in a teacher–student regression setting by leveraging real analytic sets to capture the geometry of interpolator and teacher-equivalent parameter spaces. The central result bounds the strong sample complexity by $k(\widehat{\Theta}_n) \le d_\Theta - d_{\bar{\Theta}} + 1$, implying that zero generalization error can be achieved with finite data independent of parameter count, provided the TES dimension is large. The authors instantiate the theory for deep linear and fully connected deep networks, deriving explicit TES-based bounds such as $k(\widehat{\Theta}_n) \le d^* + 1$ and $k(\widehat{\Theta}_n) \le \sum_{\ell=1}^L m_\ell^*(m_{\ell-1}+1) + 1$, respectively. Empirical results on DLNNs and MNIST corroborate the theoretical predictions using near-interpolators, demonstrating consistent data-threshold behavior even in practical, overparameterized settings.
Abstract
We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a certain threshold. Understanding the high generalization ability of large-scale models such as deep neural networks (DNNs) remains one of the central open problems in machine learning theory. While recent theoretical studies have attributed this phenomenon to the implicit bias of stochastic gradient descent (SGD) toward well-generalizing solutions, empirical evidences indicate that it primarily stems from properties of the model itself. Specifically, even randomly sampled interpolators, which are parameters that achieve zero training error, have been observed to generalize effectively. In this study, under a teacher-student framework, we prove that the generalization error of randomly sampled interpolators becomes exactly zero once the number of training samples exceeds a threshold determined by the geometric structure of the interpolator set in parameter space. As a proof technique, we leverage tools from algebraic geometry to mathematically characterize this geometric structure.
