Valid Inference for Machine Learning Model Parameters
Neil Dey, Jonathan P. Williams
TL;DR
This work tackles uncertainty quantification for machine learning model parameters, specifically the risk minimizer $θ_0$, using only training data and under weak distributional assumptions. It introduces an inferential framework that relies on a uniform convergence property and defines the set of $ε$-almost ERMs $Θ_S^ε$ as finite-sample, $1-α$ level confidence sets for $θ_0$, with extensions to noncompact parameter spaces through neighborhoods $Θ_0^δ$. By viewing these confidence sets as random sets, the authors adopt imprecise-probability concepts and develop belief and plausibility functions, enabling region-level inference; bootstrapping provides practical approximations to the distribution of these sets and yields asymptotically valid plausibilities and p-values for hypotheses about $θ_0$. A key comparison with Generalized Inferential Models shows that the proposed approach delivers finite-sample guarantees and explicit power behavior, while requiring weaker modeling assumptions. Overall, the framework offers principled, region-specific hypothesis testing and tuning-parameter inference for ML models without relying on strong population-level distributional information, with broad applicability across models that satisfy uniform convergence.
Abstract
The parameters of a machine learning model are typically learned by minimizing a loss function on a set of training data. However, this can come with the risk of overtraining; in order for the model to generalize well, it is of great importance that we are able to find the optimal parameter for the model on the entire population -- not only on the given training sample. In this paper, we construct valid confidence sets for this optimal parameter of a machine learning model, which can be generated using only the training data without any knowledge of the population. We then show that studying the distribution of this confidence set allows us to assign a notion of confidence to arbitrary regions of the parameter space, and we demonstrate that this distribution can be well-approximated using bootstrapping techniques.
