Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

Giulia Di Teodoro; Marta Monaci; Laura Palagi

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

Giulia Di Teodoro, Marta Monaci, Laura Palagi

TL;DR

This work addresses the interpretability gap in tree ensembles by introducing two TE-informed components: VITE, a hierarchical heatmap visualization of feature usage across forest levels, and MIRET, a MILP-based surrogate multivariate tree with fixed depth that mimics a target TE. MIRET integrates TE-driven information—level frequencies, proximity among samples, and class probabilities—to constrain and weight the surrogate’s splits, promoting sparsity and fidelity. Computational results on ten UCI binary-class datasets show that MIRET achieves high fidelity to the TE and competitive test accuracy while using far fewer features than the original forest, making the surrogate interpretable. Together, VITE and MIRET offer practical, TE-aware tools for understanding, validating, and deploying tree ensembles in high-stakes settings.

Abstract

The interpretability of models has become a crucial issue in Machine Learning because of algorithmic decisions' growing impact on real-world applications. Tree ensemble methods, such as Random Forests or XgBoost, are powerful learning tools for classification tasks. However, while combining multiple trees may provide higher prediction quality than a single one, it sacrifices the interpretability property resulting in "black-box" models. In light of this, we aim to develop an interpretable representation of a tree-ensemble model that can provide valuable insights into its behavior. First, given a target tree-ensemble model, we develop a hierarchical visualization tool based on a heatmap representation of the forest's feature use, considering the frequency of a feature and the level at which it is selected as an indicator of importance. Next, we propose a mixed-integer linear programming (MILP) formulation for constructing a single optimal multivariate tree that accurately mimics the target model predictions. The goal is to provide an interpretable surrogate model based on oblique hyperplane splits, which uses only the most relevant features according to the defined forest's importance indicators. The MILP model includes a penalty on feature selection based on their frequency in the forest to further induce sparsity of the splits. The natural formulation has been strengthened to improve the computational performance of {mixed-integer} software. Computational experience is carried out on benchmark datasets from the UCI repository using a state-of-the-art off-the-shelf solver. Results show that the proposed model is effective in yielding a shallow interpretable tree approximating the tree-ensemble decision function.

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

TL;DR

Abstract

Paper Structure (30 sections, 44 equations, 8 figures, 21 tables)

This paper contains 30 sections, 44 equations, 8 figures, 21 tables.

Introduction
State of the art on Interpretative models for Tree Ensemble
Internal processing
Post-Hoc approaches
Our contribution
Basic definition and preliminaries
VITE: a hierarchical VIsualization tool for TE
MIRET: a Multivariate Interpretable REbuilt optimal Tree
Introduction
Notation
Variables
Tree structure based Constraints
Incorporating TE-driven information
TE-driven Constraints
Objective function
...and 15 more sections

Figures (8)

Figure 1: Toy example: construction of the features' usage heatmap at different depths in the TE
Figure 2: Cleveland example: features' level frequency at the three different tree levels of a RF
Figure 3: Cleveland example: Figure (a) represents the features' importance by mean decrease impurity (MDI). Figure (b) represents the features' importance by mean decrease accuracy (MDA).
Figure 4: Representation of structure and notation for a tree with depth $D=3$.
Figure 5: Distribution of proximity measures of pair of training set samples
...and 3 more figures

Theorems & Definitions (3)

Definition 5.1: Level Frequency
Definition 5.2: Node Frequency
Definition 6.1: Training data fidelity

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

TL;DR

Abstract

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (3)