Scalable Subsampling Inference for Deep Neural Networks
Kejin Wu, Dimitris N. Politis
TL;DR
The paper tackles scalable, inference‑ready deep neural network regression by integrating scalable subsampling with subagging to form a subbagged DNN estimator that can be trained efficiently on large datasets. It strengthens theory by improving existing non‑asymptotic bounds through latest DNN approximation results and proves that the subagging approach achieves faster convergence under mild regularity, with a tunable bias regime. It then develops confidence and prediction interval methods based on CLT and iterated subsampling to handle potential bias, and demonstrates asymptotic validity along with finite‑sample enhancements. Extensive simulations show substantial computational savings, competitive point estimation accuracy, and CI/PI procedures that perform well in finite samples, supporting practical deployment of scalable, inference‑ready DNNs.
Abstract
Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest results on the approximation ability of DNN. More importantly, however, a non-random subsampling technique--scalable subsampling--is applied to construct a `subagged' DNN estimator. Under regularity conditions, it is shown that the subagged DNN estimator is computationally efficient without sacrificing accuracy for either estimation or prediction tasks. Beyond point estimation/prediction, we propose different approaches to build confidence and prediction intervals based on the subagged DNN estimator. In addition to being asymptotically valid, the proposed confidence/prediction intervals appear to work well in finite samples. All in all, the scalable subsampling DNN estimator offers the complete package in terms of statistical inference, i.e., (a) computational efficiency; (b) point estimation/prediction accuracy; and (c) allowing for the construction of practically useful confidence and prediction intervals.
