Table of Contents
Fetching ...

Extending Explainable Ensemble Trees (E2Tree) to regression contexts

Massimo Aria, Agostino Gnasso, Carmela Iorio, Marjolein Fokkema

TL;DR

This work tackles explainability for random forest regression by extending E2Tree, a post hoc, tree-like explanation framework that uses a dissimilarity-based representation derived from co-occurrence of observations in RF nodes. It defines the co-occurrence measure $O_{ij}$ and derives a dissimilarity $d_{ij}=1-O_{ij}$, coupled with NMSE-based stopping criteria and a Mann–Whitney validation to prune branches, yielding a global explanation $\hat{O}_{ij}$ that faithfully reflects the RF structure. The approach maintains predictive accuracy while making predictor effects and interactions transparent, demonstrated on Iris and Auto MPG with FMI values around $0.90$ and $0.75$ respectively, and visualized via heatmaps and path diagrams. This work sets the stage for applying E2Tree to other ensemble trees and motivates the development of fidelity metrics for explanations, with available code in the e2Tree package.

Abstract

Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.

Extending Explainable Ensemble Trees (E2Tree) to regression contexts

TL;DR

This work tackles explainability for random forest regression by extending E2Tree, a post hoc, tree-like explanation framework that uses a dissimilarity-based representation derived from co-occurrence of observations in RF nodes. It defines the co-occurrence measure and derives a dissimilarity , coupled with NMSE-based stopping criteria and a Mann–Whitney validation to prune branches, yielding a global explanation that faithfully reflects the RF structure. The approach maintains predictive accuracy while making predictor effects and interactions transparent, demonstrated on Iris and Auto MPG with FMI values around and respectively, and visualized via heatmaps and path diagrams. This work sets the stage for applying E2Tree to other ensemble trees and motivates the development of fidelity metrics for explanations, with available code in the e2Tree package.

Abstract

Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.
Paper Structure (7 sections, 7 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 7 sections, 7 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Iris data: heatmap of the matrix $O_{ij}$.
  • Figure 2: Iris data: Path visualization of E2Tree. The color intensity indicates the magnitude of the predicted values: darker shades typically correspond to higher predicted values, while lighter shades correspond to lower predicted values.
  • Figure 3: Iris data: Heatmap of the matrix $O_{ij}$ (sub-plot a) and heatmap of the matrix $\hat{O}_{ij}$ estimated by E2Tree (sub-plot b).
  • Figure 4: Iris data: Comparison between partitions ($k=5$) obtained applying hierarchical clustering analysis on $O_{ij}$ (left-hand side) and $\hat{O}_{ij}$ (right-hand side).
  • Figure 5: Auto MPG data: path visualization of E2Tree. The color intensity indicates the magnitude of the predicted values: darker shades typically correspond to higher predicted values, while lighter shades correspond to lower predicted values.
  • ...and 1 more figures