Leveraging Predictive Equivalence in Decision Trees
Hayden McTavish, Zachery Boner, Jon Donnelly, Margo Seltzer, Cynthia Rudin
TL;DR
This paper tackles predictive equivalence in decision trees, where multiple trees encode the same boundary but differ in evaluation order. It introduces a minimal Disjunctive Normal Form representation $\mathcal{T}_{DNF}$, built via a modified Quine–McCluskey procedure and complemented by the Blake canonical form to capture all minimal sufficient conditions, providing a faithful, complete, and succinct global description of a tree's predictions. By applying this representation, the authors demonstrate robust handling of test-time missing data, stabilize variable importance measures across Rashomon sets, and develop a Q-learning approach to minimize the cost of evaluating trees without altering the decision boundary. Empirical results on four binary datasets show that correcting for predictive equivalence reduces bias in RID, enables many predictions without imputation, and achieves cost savings in evaluation, highlighting practical benefits for interpretability and deployment. Collectively, the work offers a principled framework to reason about and leverage predictive equivalence in decision trees, with potential extensions to ensembles and group structures.
Abstract
Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence's impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.
