Multivariate Gaussian Approximation for Random Forest via Region-based Stabilization
Zhaoyang Shi, Chinmoy Bhattacharjee, Krishnakumar Balasubramanian, Wolfgang Polonik
TL;DR
The paper develops non-asymptotic, multivariate Gaussian approximation bounds for non-bagging, non-adaptive random forests built from k-Potential Nearest Neighbors (k-PNN) predictors under a Poisson sampling model. The central idea is region-based stabilization of score functions, enabling Malliavin-Stein based Gaussian limits with rates depending on the growth of k and the input dimension d; a detailed bound is given for the distance between the forest predictions and a multivariate normal, with explicit dependence on the stabilization geometry and moments. A key finding is the universality between k-PNN and k-NN forests: k-NN forests are a special case of k-PNN, but k-PNN exhibits rectangular, long-range dependence that necessitates region-based stabilization techniques and yields logarithmic-in-n rates. The paper also provides a general probabilistic result for Gaussian approximation of Poisson-functionals with region-stabilizing scores, laying groundwork for broader applicability to related statistical problems and potential extensions to adaptive forests and other regression procedures. Overall, the results offer finite-sample guarantees for multivariate Gaussian behavior of random forest predictions under weak smoothness and moment conditions, highlighting the practical relevance for inference in high-dimensional nonparametric regression.
Abstract
We derive Gaussian approximation bounds for $k$-Potential Nearest Neighbor ($k$-PNN) based random forest predictions based on a set of training points given by a Poisson process under fairly mild regularity assumptions on the data generating process. Our approach is based on the key observation that $k$-PNN based random forest predictions satisfy a certain geometric property called region-based stabilization. We also compare the rates with those of $k$-nearest neighbor-based random forests, highlighting a form of universality in our result. In the process of developing our results, we also establish a probabilistic result on multivariate Gaussian approximation bounds for general functionals of Poisson process that are region-based stabilizing. This general result makes use of the Malliavin-Stein method, and is potentially applicable to various related statistical problems.
