Concentration Inequalities for Stochastic Optimization of Unbounded Objective Functions with Application to Denoising Score Matching
Jeremiah Birrell
TL;DR
This work develops concentration inequalities for stochastic optimization with unbounded objective functions and unbounded support by generalizing McDiarmid's inequality to sample-dependent, locally Lipschitz bounds and by deriving a distribution-dependent Rademacher complexity bound for unbounded function classes. It establishes a uniform law of large numbers with sample reuse when data are paired with easily-sampled auxiliary variables, and then applies these results to denoising score matching (DSM) and generative adversarial networks (GANs), quantifying the benefits of sample reuse. The key contributions include explicit $L^1$ and high-probability bounds for stochastic optima, accommodating heavy-tailed distributions and unbounded objectives, with practical implications for training DSM and GANs using auxiliary Gaussian inputs. Overall, the framework broadens the repertoire of provable guarantees in stochastic optimization under realistic, unbounded-tail settings and clarifies the impact of reusing auxiliary randomness in learning algorithms.
Abstract
We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following key tools: 1) A new form of McDiarmid's inequality that is based on sample-dependent one-component mean-difference bounds and which leads to a novel uniform law of large numbers result for unbounded functions. 2) A new Rademacher complexity bound for families of functions that satisfy an appropriate sample-dependent Lipschitz property, which allows for application to a large class of distributions with unbounded support. As an application of these results, we derive statistical error bounds for denoising score matching (DSM), an application that inherently requires one to consider unbounded objective functions and distributions with unbounded support, even in cases where the data distribution has bounded support. In addition, our results quantify the benefit of sample-reuse in algorithms that employ easily-sampled auxiliary random variables in addition to the training data, e.g., as in DSM, which uses auxiliary Gaussian random variables.
