Table of Contents
Fetching ...

Leveraging Predictive Equivalence in Decision Trees

Hayden McTavish, Zachery Boner, Jon Donnelly, Margo Seltzer, Cynthia Rudin

TL;DR

This paper tackles predictive equivalence in decision trees, where multiple trees encode the same boundary but differ in evaluation order. It introduces a minimal Disjunctive Normal Form representation $\mathcal{T}_{DNF}$, built via a modified Quine–McCluskey procedure and complemented by the Blake canonical form to capture all minimal sufficient conditions, providing a faithful, complete, and succinct global description of a tree's predictions. By applying this representation, the authors demonstrate robust handling of test-time missing data, stabilize variable importance measures across Rashomon sets, and develop a Q-learning approach to minimize the cost of evaluating trees without altering the decision boundary. Empirical results on four binary datasets show that correcting for predictive equivalence reduces bias in RID, enables many predictions without imputation, and achieves cost savings in evaluation, highlighting practical benefits for interpretability and deployment. Collectively, the work offers a principled framework to reason about and leverage predictive equivalence in decision trees, with potential extensions to ensembles and group structures.

Abstract

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence's impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.

Leveraging Predictive Equivalence in Decision Trees

TL;DR

This paper tackles predictive equivalence in decision trees, where multiple trees encode the same boundary but differ in evaluation order. It introduces a minimal Disjunctive Normal Form representation , built via a modified Quine–McCluskey procedure and complemented by the Blake canonical form to capture all minimal sufficient conditions, providing a faithful, complete, and succinct global description of a tree's predictions. By applying this representation, the authors demonstrate robust handling of test-time missing data, stabilize variable importance measures across Rashomon sets, and develop a Q-learning approach to minimize the cost of evaluating trees without altering the decision boundary. Empirical results on four binary datasets show that correcting for predictive equivalence reduces bias in RID, enables many predictions without imputation, and achieves cost savings in evaluation, highlighting practical benefits for interpretability and deployment. Collectively, the work offers a principled framework to reason about and leverage predictive equivalence in decision trees, with potential extensions to ensembles and group structures.

Abstract

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence's impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.

Paper Structure

This paper contains 49 sections, 12 theorems, 2 equations, 20 figures, 5 tables, 7 algorithms.

Key Result

Proposition 3.1

Consider any tree $\mathcal{T}$ and let $x \in \mathbb{R}^d$ be a complete sample. Then $\mathcal{T}_{\textrm{DNF}}(x) = \mathcal{T}(x)$.

Figures (20)

  • Figure 1: Two decision trees, suggesting a different evaluation order, but which represent the same logical formula $(X_1 \land X_2)$.
  • Figure 2: An example of a decision tree where the minimal DNF and Blake canonical forms differ. The minimal DNF of this tree describes the tree's behaviour with two cases. The Blake canonical form includes a third reason for predicting True, which always falls into the preceding two cases but relies on different variables.
  • Figure 3: Two equivalent decision trees for the setting where $Y = X_1 X_2$ and $X_1, X_2 \overset{\text{i.i.d.}} \sim Bernoulli(0.5)$. Although they always produce identical predictions, achieve the same objective value, and are produced by the same algorithm, these two trees produce dramatically different variable importance values. gini refers to the Gini coefficient of each leaf, samples to the number of points falling into each leaf, and value denotes the number of negative (left) and positive (right) samples in the leaf.
  • Figure 4: The Gini Importance for 12 variables over 3 predictively equivalent decision trees. Here, each color represents a different tree. Even though these trees are predictively equivalent, they produce radically different variable importance values.
  • Figure 5: The distribution of variable importance from RID for three important variables on the COMPAS dataset, with and without correcting for predictive equivalence. When adjusting for predictive equivalence (shown in orange), more probability mass is given to zero importance for age and number of juvenile crimes, while high importance values receive more probability mass for number of priors. All other variables in this dataset had all probability mass at 0 importance in both cases.
  • ...and 15 more figures

Theorems & Definitions (19)

  • Proposition 3.1: Faithfulness
  • Theorem 3.2: Completeness
  • Proposition 3.3: Succinctness
  • Theorem 3.4: Resolution of Predictive Equivalence
  • Corollary 6.1: Irrelevance of Imputation
  • Corollary 6.2: Unbiasedness under test-time missingness
  • Lemma 1.1
  • proof
  • proof
  • Proposition : Faithfulness
  • ...and 9 more