Table of Contents
Fetching ...

SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration

Elif Arslan, Jacobus G. M. van der Linden, Serge Hoogendoorn, Marco Rinaldi, Emir Demirović

TL;DR

The paper tackles the challenge of leveraging Rashomon sets—collections of near-optimal sparse decision trees—for interpretable, high-stakes modeling. It introduces SORTD, an anytime, best-first framework that enumerates Rashomon sets in nondecreasing order of the objective, enabling early termination and efficient downstream tasks. A key innovation is a depth-two subroutine that dramatically speeds up computation and a caching-based design that scales to larger feature sets and depths, while supporting separable and totally ordered objectives with post-hoc evaluation of additional criteria such as fairness. Empirically, SORTD achieves up to two orders of magnitude faster runtime and much lower memory usage than the state of the art, and it demonstrates robust applicability to regression and multi-objective post-evaluation, making Rashomon-set analysis practical for real-world model selection and explanations.

Abstract

Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.

SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration

TL;DR

The paper tackles the challenge of leveraging Rashomon sets—collections of near-optimal sparse decision trees—for interpretable, high-stakes modeling. It introduces SORTD, an anytime, best-first framework that enumerates Rashomon sets in nondecreasing order of the objective, enabling early termination and efficient downstream tasks. A key innovation is a depth-two subroutine that dramatically speeds up computation and a caching-based design that scales to larger feature sets and depths, while supporting separable and totally ordered objectives with post-hoc evaluation of additional criteria such as fairness. Empirically, SORTD achieves up to two orders of magnitude faster runtime and much lower memory usage than the state of the art, and it demonstrates robust applicability to regression and multi-objective post-evaluation, making Rashomon-set analysis practical for real-world model selection and explanations.

Abstract

Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.

Paper Structure

This paper contains 52 sections, 9 equations, 10 figures, 4 tables, 6 algorithms.

Figures (10)

  • Figure 1: Search tree structure. The left-most node is the current search node with its sorted solution list. The middle nodes are branching nodes with features $f_1$ and $f_2$. The right-most is a leaf node.
  • Figure 2: Next solution calculation in a branching node.
  • Figure 3: Cumulative runtime (s) distribution across tree depths $d$ and Rashomon set sizes $n^T$. The x-axis is logarithmic and shows the runtime for enumerating the full Rashomon set. SORTD is up to two orders of magnitude faster than TreeFARMS.
  • Figure 4: Cumulative runtime (s) distribution across varying feature dimensionality, with depth budget four and $n^T = 10^6$. The x-axis is logarithmic and shows the runtime for enumerating the full Rashomon set. SORTD scales better with more features than TreeFARMS.
  • Figure 5: Cumulative memory usage (GB) distribution across tree depths $d$ and Rashomon set sizes $n^T$. Note the logarithmic x-axis. SORTD uses one order of magnitude less memory than TreeFARMS.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Example 1
  • Example 2
  • Example 3