Table of Contents
Fetching ...

Witty: An Efficient Solver for Computing Minimum-Size Decision Trees

Luca Pascal Staus, Christian Komusiewicz, Frank Sommer, Manuel Sorge

TL;DR

The paper introduces Witty, an efficient solver for minimum-size perfect decision trees grounded in the witness-tree paradigm. It combines strong theoretical underpinnings with practical heuristics, including data reductions, a refined refinement order, and powerful lower-bounding strategies (ImpLB, PairLB) plus subset constraints and caching, achieving substantial speedups over state-of-the-art baselines. Empirical results on Penn ML Benchmark datasets show Witty solving more instances and dramatically accelerating runtime compared with MurTree and SAT-based methods, especially on high-domain-size instances, while maintaining optimality. The work also provides meaningful theoretical improvements to MSDT running-time bounds and outlines concrete directions for extending the approach to multiclass settings, depth-constrained trees, and looser error allowances with preserved performance gains.

Abstract

Decision trees are a classic model for summarizing and classifying data. To enhance interpretability and generalization properties, it has been proposed to favor small decision trees. Accordingly, in the minimum-size decision tree training problem (MSDT), the input is a set of training examples in $\mathbb{R}^d$ with class labels and we aim to find a decision tree that classifies all training examples correctly and has a minimum number of nodes. MSDT is NP-hard and therefore presumably not solvable in polynomial time. Nevertheless, Komusiewicz et al. [ICML '23] developed a promising algorithmic paradigm called witness trees which solves MSDT efficiently if the solution tree is small. In this work, we test this paradigm empirically. We provide an implementation, augment it with extensive heuristic improvements, and scrutinize it on standard benchmark instances. The augmentations achieve a mean 324-fold (median 84-fold) speedup over the naive implementation. Compared to the state of the art they achieve a mean 32-fold (median 7-fold) speedup over the dynamic programming based MurTree solver [Demirović et al., J. Mach. Learn. Res. '22] and a mean 61-fold (median 25-fold) speedup over SAT-based implementations [Janota and Morgado, SAT '20]. As a theoretical result we obtain an improved worst-case running-time bound for MSDT.

Witty: An Efficient Solver for Computing Minimum-Size Decision Trees

TL;DR

The paper introduces Witty, an efficient solver for minimum-size perfect decision trees grounded in the witness-tree paradigm. It combines strong theoretical underpinnings with practical heuristics, including data reductions, a refined refinement order, and powerful lower-bounding strategies (ImpLB, PairLB) plus subset constraints and caching, achieving substantial speedups over state-of-the-art baselines. Empirical results on Penn ML Benchmark datasets show Witty solving more instances and dramatically accelerating runtime compared with MurTree and SAT-based methods, especially on high-domain-size instances, while maintaining optimality. The work also provides meaningful theoretical improvements to MSDT running-time bounds and outlines concrete directions for extending the approach to multiclass settings, depth-constrained trees, and looser error allowances with preserved performance gains.

Abstract

Decision trees are a classic model for summarizing and classifying data. To enhance interpretability and generalization properties, it has been proposed to favor small decision trees. Accordingly, in the minimum-size decision tree training problem (MSDT), the input is a set of training examples in with class labels and we aim to find a decision tree that classifies all training examples correctly and has a minimum number of nodes. MSDT is NP-hard and therefore presumably not solvable in polynomial time. Nevertheless, Komusiewicz et al. [ICML '23] developed a promising algorithmic paradigm called witness trees which solves MSDT efficiently if the solution tree is small. In this work, we test this paradigm empirically. We provide an implementation, augment it with extensive heuristic improvements, and scrutinize it on standard benchmark instances. The augmentations achieve a mean 324-fold (median 84-fold) speedup over the naive implementation. Compared to the state of the art they achieve a mean 32-fold (median 7-fold) speedup over the dynamic programming based MurTree solver [Demirović et al., J. Mach. Learn. Res. '22] and a mean 61-fold (median 25-fold) speedup over SAT-based implementations [Janota and Morgado, SAT '20]. As a theoretical result we obtain an improved worst-case running-time bound for MSDT.

Paper Structure

This paper contains 58 sections, 12 theorems, 11 equations, 13 figures, 6 tables.

Key Result

Lemma 5.5

The Equivalent Cuts Rule is correct.

Figures (13)

  • Figure 1: Examples of one-step refinements. (a) shows a current witness tree $W$ where one red example $e$ is misclassified in leaf $z$. (b), (c), and (d) show all three possible one-step refinements of $W$ where $e$ is the witness of the new leaf $t$. The orange path is the classification path of example $e$. Note that the new node can have any cut separating example $e$ from the witness of leaf $z$.
  • Figure 2: Example data set and witness tree.
  • Figure 3: Comparison of different algorithms for MSDT. For each time $t$ it is shown how many instances were solved by each algorithm in less than $t$ seconds.
  • Figure 4: Comparison of the running times of Witty and MurTree for each instance with the color representing the largest domain size $D$.
  • Figure 5: Effect of the different improvements on the running time.
  • ...and 8 more figures

Theorems & Definitions (32)

  • Definition 5.1
  • Lemma 5.5
  • proof
  • Lemma 5.7
  • proof
  • Lemma 5.9
  • proof
  • Definition 7.1
  • Theorem 7.2: $\bigstar$
  • Lemma 7.3
  • ...and 22 more