Table of Contents
Fetching ...

TE2Rules: Explaining Tree Ensembles using Rules

G Roshan Lal, Xiaotong Chen, Varun Mithal

TL;DR

TE2Rules addresses the opacity of binary tree ensembles by mining cross-tree node combinations to produce a small, high-precision rule list that explains the minority (positive) class. It uses Apriori on a data slice to identify frequent node sets associated with positive predictions, converts them into interpretable if-then rules, and applies a greedy set-cover to produce a compact global explanation. The approach achieves high fidelity to the original model on both overall data and the minority class, with scalable runtimes comparable to baselines and a tunable fidelity-runtime tradeoff via stage count. The work demonstrates practical interpretability gains for high-stakes tabular tasks and provides open-source code.

Abstract

Tree Ensemble (TE) models, such as Gradient Boosted Trees, often achieve optimal performance on tabular datasets, yet their lack of transparency poses challenges for comprehending their decision logic. This paper introduces TE2Rules (Tree Ensemble to Rules), a novel approach for explaining binary classification tree ensemble models through a list of rules, particularly focusing on explaining the minority class. Many state-of-the-art explainers struggle with minority class explanations, making TE2Rules valuable in such cases. The rules generated by TE2Rules closely approximate the original model, ensuring high fidelity, providing an accurate and interpretable means to understand decision-making. Experimental results demonstrate that TE2Rules scales effectively to tree ensembles with hundreds of trees, achieving higher fidelity within runtimes comparable to baselines. TE2Rules allows for a trade-off between runtime and fidelity, enhancing its practical applicability. The implementation is available here: https://github.com/linkedin/TE2Rules.

TE2Rules: Explaining Tree Ensembles using Rules

TL;DR

TE2Rules addresses the opacity of binary tree ensembles by mining cross-tree node combinations to produce a small, high-precision rule list that explains the minority (positive) class. It uses Apriori on a data slice to identify frequent node sets associated with positive predictions, converts them into interpretable if-then rules, and applies a greedy set-cover to produce a compact global explanation. The approach achieves high fidelity to the original model on both overall data and the minority class, with scalable runtimes comparable to baselines and a tunable fidelity-runtime tradeoff via stage count. The work demonstrates practical interpretability gains for high-stakes tabular tasks and provides open-source code.

Abstract

Tree Ensemble (TE) models, such as Gradient Boosted Trees, often achieve optimal performance on tabular datasets, yet their lack of transparency poses challenges for comprehending their decision logic. This paper introduces TE2Rules (Tree Ensemble to Rules), a novel approach for explaining binary classification tree ensemble models through a list of rules, particularly focusing on explaining the minority class. Many state-of-the-art explainers struggle with minority class explanations, making TE2Rules valuable in such cases. The rules generated by TE2Rules closely approximate the original model, ensuring high fidelity, providing an accurate and interpretable means to understand decision-making. Experimental results demonstrate that TE2Rules scales effectively to tree ensembles with hundreds of trees, achieving higher fidelity within runtimes comparable to baselines. TE2Rules allows for a trade-off between runtime and fidelity, enhancing its practical applicability. The implementation is available here: https://github.com/linkedin/TE2Rules.
Paper Structure (9 sections, 8 figures, 1 table)

This paper contains 9 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Dataset
  • Figure 2: Apriori Stage 1
  • Figure 3: Apriori Stage 2
  • Figure 4: Apriori Stage 3
  • Figure 5: An example of a tree ensemble with n = 2 trees each with depth d = 2 and a slice of data used to run TE2Rules. The tree ensemble uses features like color, odor, variety of a fruit to predict if it is edible. The positive class corresponds to edible = 1.
  • ...and 3 more figures