Table of Contents
Fetching ...

Empirical Likelihood for Random Forests and Ensembles

Harold D. Chiang, Yukitoshi Matsushita, Taisuke Otsu

TL;DR

The paper develops an empirical likelihood (EL) framework for random forests and ensembles by modeling predictions as incomplete $U$-statistics from subsampling. It constructs a jackknife-after-subsampling EL statistic that is asymptotically $\chi^2_1$ under dense-subsampling and introduces a modified EL to achieve asymptotic pivotality in sparse-subsampling settings. The theory is specialized to honest random forests with verifiable low-level conditions, and simulations show that the modified EL delivers accurate coverage compared with existing methods, especially in challenging sparse regimes. Overall, the approach provides a computationally efficient, range-preserving inference tool for ensemble methods, enabling principled uncertainty quantification in high-dimensional settings.

Abstract

We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.

Empirical Likelihood for Random Forests and Ensembles

TL;DR

The paper develops an empirical likelihood (EL) framework for random forests and ensembles by modeling predictions as incomplete -statistics from subsampling. It constructs a jackknife-after-subsampling EL statistic that is asymptotically under dense-subsampling and introduces a modified EL to achieve asymptotic pivotality in sparse-subsampling settings. The theory is specialized to honest random forests with verifiable low-level conditions, and simulations show that the modified EL delivers accurate coverage compared with existing methods, especially in challenging sparse regimes. Overall, the approach provides a computationally efficient, range-preserving inference tool for ensemble methods, enabling principled uncertainty quantification in high-dimensional settings.

Abstract

We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete -statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.

Paper Structure

This paper contains 14 sections, 13 theorems, 95 equations, 1 figure, 2 algorithms.

Key Result

Theorem 1

Suppose that Assumption asm:JEL holds true. Then under dense-subsampling asymptotics in eq:dense, we have

Figures (1)

  • Figure 1: Coverage probabilities of 95% confidence intervals across sample sizes $n \in \{200,400,800\}$, Linear (MLR) and nonlinear (MARS) DGPs, and four procedures (IJ, J, EL, mEL). The horizontal line marks the nominal 0.95 level.

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Theorem 4: modified EL for honest random forest
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 3 more