Table of Contents
Fetching ...

Asymptotic confidence bands for centered purely random forests

Natalie Neumeyer, Jan Rabe, Mathias Trabs

TL;DR

This paper develops asymptotic uniform confidence bands for regression functions estimated by centered purely random forests in a multivariate nonparametric setting. By interpreting CPRFs as generalized U-statistics and leveraging Gaussian-approximation techniques for the supremum of empirical processes, it derives a nonparametric confidence band around the CPRF estimator that adapts to local variance via $\Psi_k(x)$ and does not rely on a limit distribution. The authors introduce the Ehrenfest centered CPRF to achieve minimax optimal rates and provide a pointwise CLT as well as a uniform convergence framework, culminating in a practical methodology for honest confidence bands in multivariate settings. Simulation studies illustrate finite-sample performance, showing favorable coverage and band radii relative to histogram-based methods, with band width adapting to local feature density.

Abstract

In a multivariate nonparametric regression setting we construct explicit asymptotic uniform confidence bands for centered purely random forests. Since the most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates, we propose a new type of purely random forests, called the Ehrenfest centered purely random forests, which achieve minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes. Our theoretical findings are illustrated in simulation examples.

Asymptotic confidence bands for centered purely random forests

TL;DR

This paper develops asymptotic uniform confidence bands for regression functions estimated by centered purely random forests in a multivariate nonparametric setting. By interpreting CPRFs as generalized U-statistics and leveraging Gaussian-approximation techniques for the supremum of empirical processes, it derives a nonparametric confidence band around the CPRF estimator that adapts to local variance via and does not rely on a limit distribution. The authors introduce the Ehrenfest centered CPRF to achieve minimax optimal rates and provide a pointwise CLT as well as a uniform convergence framework, culminating in a practical methodology for honest confidence bands in multivariate settings. Simulation studies illustrate finite-sample performance, showing favorable coverage and band radii relative to histogram-based methods, with band width adapting to local feature density.

Abstract

In a multivariate nonparametric regression setting we construct explicit asymptotic uniform confidence bands for centered purely random forests. Since the most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates, we propose a new type of purely random forests, called the Ehrenfest centered purely random forests, which achieve minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes. Our theoretical findings are illustrated in simulation examples.

Paper Structure

This paper contains 19 sections, 25 theorems, 263 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Lemma 3.1

The uniform CPRF satisfies for $q\in(0,2]$

Figures (5)

  • Figure 1: Scatter plot of estimation errors (blue dots) of ten uniform CPRF estimators with $n=2000$, $k=5$ and $\varepsilon\sim\mathcal{N}(0,1)$ on a non equidistant test grid and confidence band radii (red lines) corresponding to confidence levels $1-\beta\in\{0.9,0.95,0.99\}$. The horizontal axis corresponds to the test grid of the feature space from (\ref{['eq:sup test grid']}). Blocks of $32$ values on the horizontal axis correspond to entries in the grid, that have the same first component.
  • Figure 2: Comparison of confidence band coverage and radii for the uniform CPRF, the Ehrenfest CPRF and a histogram regression estimator in dependence on $n$. On the left we observe the empirical coverage together with the nominal coverage as horizonal line. On the right, the radii are plotted on a logarithmic scale. From top to bottom, the plots are for $1-\beta\in\{0.9,0.95,0.99\}$.
  • Figure 3: Realization of a uniform CPRF (top left) based on the observation shown in the heat map (top right), histogram estimation on the $2^{k}$-grid as a reference (bottom left) and true regression function $m$ from (\ref{['eq:m sim p2']}) (bottom right).
  • Figure 4: Comparison of confidence band coverage and radii for the uniform CPRF and the Ehrenfest CPRF in dependence on $n$ and the dimension $p=2,4$. On the left we observe the empirical coverage together with the nominal coverage as horizonal line. On the right, the radii are plotted on a logarithmic scale. From top to bottom, the plots are for $1-\beta\in\{0.9,0.95,0.99\}$.
  • Figure 5: Estimated densities of standardized $\Vert U_{n,r_{n},\omega}^{(\varepsilon)}\Vert_{\infty}$ for different $n$ and $\mathbf{S}_{k}$, both for $k=5$.

Theorems & Definitions (52)

  • Lemma 3.1: Characteristics of uniform CPRF
  • Proposition 3.2: Characteristics of the Ehrenfest CPRF
  • Remark 3.3
  • Theorem 3.4: Pointwise convergence of CPRFs
  • Proposition 3.5: Pointwise central limit theorem for CPRFs
  • Theorem 3.6: Uniform convergence of a CPRF
  • Theorem 4.1: Asymptotic confidence band for CPRF
  • proof : Sketch of the proof
  • Corollary 4.2: Asymptotic confidence band for the Ehrenfest CPRF
  • proof
  • ...and 42 more