Table of Contents
Fetching ...

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Brian Liu, Rahul Mazumder, Peter Radchenko

TL;DR

This work addresses the interpretability gap in powerful tree ensembles by introducing an estimator that extracts compact, human-readable rule sets without sacrificing predictive performance. The method represents ensemble outputs as sums of rules and jointly optimizes the number of rules and their interaction depth using a tailored exact algorithm and a scalable approximate path algorithm. Theoretical guarantees in the form of non-asymptotic prediction error bounds are established, showing competitive performance against an oracle under the same complexity constraints. Empirically, the approach outperforms existing rule-extraction methods, enables efficient regularization-path computation, and yields interpretable scorecards, demonstrated both on OpenML datasets and a real ICU LOS case study with practical impact for decision-making.

Abstract

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

TL;DR

This work addresses the interpretability gap in powerful tree ensembles by introducing an estimator that extracts compact, human-readable rule sets without sacrificing predictive performance. The method represents ensemble outputs as sums of rules and jointly optimizes the number of rules and their interaction depth using a tailored exact algorithm and a scalable approximate path algorithm. Theoretical guarantees in the form of non-asymptotic prediction error bounds are established, showing competitive performance against an oracle under the same complexity constraints. Empirically, the approach outperforms existing rule-extraction methods, enables efficient regularization-path computation, and yields interpretable scorecards, demonstrated both on OpenML datasets and a real ICU LOS case study with practical impact for decision-making.

Abstract

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

Paper Structure

This paper contains 59 sections, 8 theorems, 74 equations, 27 figures, 6 tables, 4 algorithms.

Key Result

Proposition 1

Problem CIP_reformulation.prob is an equivalent reformulation of Problem tree_based_problem1.prob.

Figures (27)

  • Figure 1: Decision tree fit to predict household income using survey responses. Feature speduc is the spouse’s education (years); class is the self-identified income class.
  • Figure 3: Our proposed framework, which jointly prunes depth and rules, compared with existing pruning approaches.
  • Figure 4: Set $\mathcal{C}_i^t$ represents the descendants of node $i$, e.g., $\mathcal{C}_3^t = \{6,7 \}$. If node 3 is selected to be a terminal node of a extracted rule then nodes 6 and 7 cannot be selected.
  • Figure 5: Example regularization path computed by our framework on the Wind example from haslett1989space. The path consists of a sequence of solutions that correspond to varying model sizes and predictive performances.
  • Figure 6: Effects of node attributes. Compared to rule-weighting, depth-weighting extracts shallower rules and feature-weighting promotes feature sparsity.
  • ...and 22 more figures

Theorems & Definitions (10)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Remark 1
  • Remark 2
  • Corollary 1
  • Proposition 4
  • Proposition 5
  • Lemma 1