Table of Contents
Fetching ...

An RKHS Perspective on Tree Ensembles

Mehdi Dagdoug, Clement Dombry, Jean-Jil Duchamps

TL;DR

This work formalizes a reproducing-kernel framework for tree ensembles by constructing kernels on the random partitions generated by randomized trees. It establishes core analytical properties of the resulting Random Forest kernel and its RKHS, provides a variational interpretation of infinite RF predictors, and links to infinitesimal gradient boosting as a gradient flow on a Hilbert manifold. The framework sheds light on why tree ensembles perform well in practice due to data-dependent geometry and regularization effects, and demonstrates practical utility via kernel-PCA and a geometric variable-importance criterion (GVI). Collectively, the results connect ensemble methods, kernel theory, and continuous-time optimization, offering both interpretability and methodological tools for tabular data. The empirical illustrations show competitive discriminative embeddings and robust variable-importance assessment, suggesting broad applicability of RF-based kernels in kernel methods and interpretability tasks.

Abstract

Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictions are obtained by aggregating many randomized regression trees. In this paper, we develop a theoretical framework for analyzing such methods through Reproducing Kernel Hilbert Spaces (RKHSs) constructed on tree ensembles -- more precisely, on the random partitions generated by randomized regression trees. We establish fundamental analytical properties of the resulting Random Forest kernel, including boundedness, continuity, and universality, and show that a Random Forest predictor can be characterized as the unique minimizer of a penalized empirical risk functional in this RKHS, providing a variational interpretation of ensemble learning. We further extend this perspective to the continuous-time formulation of Gradient Boosting introduced by Dombry and Duchamps, and demonstrate that it corresponds to a gradient flow on a Hilbert manifold induced by the Random Forest RKHS. A key feature of this framework is that both the kernel and the RKHS geometry are data-dependent, offering a theoretical explanation for the strong empirical performance of tree-based ensembles. Finally, we illustrate the practical potential of this approach by introducing a kernel principal component analysis built on the Random Forest kernel, which enhances the interpretability of ensemble models, as well as GVI, a new geometric variable importance criterion.

An RKHS Perspective on Tree Ensembles

TL;DR

This work formalizes a reproducing-kernel framework for tree ensembles by constructing kernels on the random partitions generated by randomized trees. It establishes core analytical properties of the resulting Random Forest kernel and its RKHS, provides a variational interpretation of infinite RF predictors, and links to infinitesimal gradient boosting as a gradient flow on a Hilbert manifold. The framework sheds light on why tree ensembles perform well in practice due to data-dependent geometry and regularization effects, and demonstrates practical utility via kernel-PCA and a geometric variable-importance criterion (GVI). Collectively, the results connect ensemble methods, kernel theory, and continuous-time optimization, offering both interpretability and methodological tools for tabular data. The empirical illustrations show competitive discriminative embeddings and robust variable-importance assessment, suggesting broad applicability of RF-based kernels in kernel methods and interpretability tasks.

Abstract

Random Forests and Gradient Boosting are among the most effective algorithms for supervised learning on tabular data. Both belong to the class of tree-based ensemble methods, where predictions are obtained by aggregating many randomized regression trees. In this paper, we develop a theoretical framework for analyzing such methods through Reproducing Kernel Hilbert Spaces (RKHSs) constructed on tree ensembles -- more precisely, on the random partitions generated by randomized regression trees. We establish fundamental analytical properties of the resulting Random Forest kernel, including boundedness, continuity, and universality, and show that a Random Forest predictor can be characterized as the unique minimizer of a penalized empirical risk functional in this RKHS, providing a variational interpretation of ensemble learning. We further extend this perspective to the continuous-time formulation of Gradient Boosting introduced by Dombry and Duchamps, and demonstrate that it corresponds to a gradient flow on a Hilbert manifold induced by the Random Forest RKHS. A key feature of this framework is that both the kernel and the RKHS geometry are data-dependent, offering a theoretical explanation for the strong empirical performance of tree-based ensembles. Finally, we illustrate the practical potential of this approach by introducing a kernel principal component analysis built on the Random Forest kernel, which enhances the interpretability of ensemble models, as well as GVI, a new geometric variable importance criterion.

Paper Structure

This paper contains 56 sections, 26 theorems, 245 equations, 4 figures, 3 tables.

Key Result

Proposition 4

Assume that either $\mathrm{P}=\mathrm{P}_n$, or that $\mathrm{P}=\mathrm{P}^*$ and the regression function $x\mapsto \mathbb{E}[Y\mid X=x]$ is bounded on $[0,1]^p$. Then the convergence in eq:cv-rf holds almost surely uniformly on $[0,1]^p$.

Figures (4)

  • Figure 1: Summary of the simulation results for the PCA application. Subfigure (a) reports the classification accuracy, while subfigure (b) reports the silhouette scores.
  • Figure 2: Relative improvement of MSE versus standard PCA.
  • Figure 3: Summary of the effective sample sizes for each Random Forest algorithm over the classification datasets.
  • Figure 4: Summary of the effective sample sizes for each Random Forest algorithm over the regression datasets.

Theorems & Definitions (39)

  • Definition 1
  • Definition 2
  • Remark 3
  • Proposition 4
  • Example 1: Uniform partition in dimension 1.
  • Proposition 5
  • Remark 6
  • Proposition 7
  • Proposition 8
  • Proposition 9
  • ...and 29 more