Stochastic Gradient Descent for Nonparametric Additive Regression
Xin Chen, Jason M. Klusowski
TL;DR
This work introduces Functional SGD (F-SGD) for online training of nonparametric additive models, providing memory- and time-efficient learning by updating truncated basis coefficients. The authors establish a rigorous oracle inequality and demonstrate minimax-optimal rates over Sobolev ellipsoid function classes, with robustness to model mis-specification and even when the covariate distribution lacks full support. The method achieves favorable scalability relative to kernel methods and competing online approaches, and extends to adaptation via Lepski’s method for unknown smoothness, as well as potential general convex losses. Practically, F-SGD enables streaming-friendly, scalable estimation of high-dimensional additive models with strong theoretical guarantees and empirical efficiency.
Abstract
This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on both the dimensionality of the data and the size of the training sample. Unlike past work, we also provide polynomial convergence rates even when the covariates do not have full support on their domain.
