Table of Contents
Fetching ...

Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance

Jacobus G. M. van der Linden, Daniël Vos, Mathijs M. de Weerdt, Sicco Verwer, Emir Demirović

TL;DR

This study clarifies when optimal decision trees surpass greedy approaches by dissecting training objectives and tuning. By evaluating 11 objectives and six tuning methods across 180 real and synthetic datasets, it shows that non-concave, regularized objectives can improve ODT performance, while greedy methods benefit from strictly concave criteria. Tunability is shown to be crucial for ODTs, with depth-tuning offering a fast, effective option, and the authors provide practical recommendations and open-source code to enable fair benchmarking. The findings also reveal nuanced, data-dependent differences between ODTs and greedy trees, especially regarding interpretability and scalability, and advocate for standardized evaluation practices. Collectively, the work advances understanding of ODT design, offers actionable guidance for practitioners, and lays groundwork for future, more scalable optimal-tree methods.

Abstract

Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly, in contrast to traditional approaches that locally optimize an impurity or information metric. However, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the opposite. Through a novel extensive experimental study, we provide new insights into the design and behavior of learning decision trees. In particular, we identify and analyze two relatively unexplored aspects of ODTs: the objective function used in training trees, and tuning techniques. Thus, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 11 objective functions, six tuning methods, and six claims from the literature on optimal and greedy methods on 180 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we show the effect of (non-)concave objectives in greedy and optimal approaches; we highlight the importance of proper tuning of ODTs; support and refute several claims from the literature; provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods; and code for future comparisons.

Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance

TL;DR

This study clarifies when optimal decision trees surpass greedy approaches by dissecting training objectives and tuning. By evaluating 11 objectives and six tuning methods across 180 real and synthetic datasets, it shows that non-concave, regularized objectives can improve ODT performance, while greedy methods benefit from strictly concave criteria. Tunability is shown to be crucial for ODTs, with depth-tuning offering a fast, effective option, and the authors provide practical recommendations and open-source code to enable fair benchmarking. The findings also reveal nuanced, data-dependent differences between ODTs and greedy trees, especially regarding interpretability and scalability, and advocate for standardized evaluation practices. Collectively, the work advances understanding of ODT design, offers actionable guidance for practitioners, and lays groundwork for future, more scalable optimal-tree methods.

Abstract

Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly, in contrast to traditional approaches that locally optimize an impurity or information metric. However, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the opposite. Through a novel extensive experimental study, we provide new insights into the design and behavior of learning decision trees. In particular, we identify and analyze two relatively unexplored aspects of ODTs: the objective function used in training trees, and tuning techniques. Thus, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 11 objective functions, six tuning methods, and six claims from the literature on optimal and greedy methods on 180 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we show the effect of (non-)concave objectives in greedy and optimal approaches; we highlight the importance of proper tuning of ODTs; support and refute several claims from the literature; provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods; and code for future comparisons.
Paper Structure (45 sections, 1 equation, 24 figures, 3 tables)

This paper contains 45 sections, 1 equation, 24 figures, 3 tables.

Figures (24)

  • Figure 1: (Left) Three splitting heuristics compared. The horizontal axis shows the binary class distribution expressed as the probability of the first class, and the vertical axis shows the corresponding splitting criterion value (lower is better). (Right) Geometric interpretation of the weighted mean error of two children when $p$, $p_1$, and $p_2$ represent the class distributions of the parent and the two children respectively. The length of the arrow indicates the improvement in the splitting criterion value. Adapted from flach2012ml.
  • Figure 2: Objective values for different objective functions for a single leaf node. (Left) The leaf node size is fixed at $n=100$. (Right) The misclassifications are fixed at $e=20$. Surprisingly, the value of the strictly concave objectives increases for a fixed error and increasing leaf node size.
  • Figure 3: The new objectives show opposite behavior to the strictly concave objectives. Left, the leaf node size is fixed at $n=100$. Right, the misclassifications are fixed at $e=20$.
  • Figure 4: Comparing ODT objectives for $\operatorname{max-depth}=4$. (a) Orange (blue) indicates (non-)concave. The average accuracy and number of leaf nodes over all data sets and folds are shown, sorted by the average rank. (b) Nemenyi critical distance rank test. The average rank per objective is plotted and objectives with a rank difference smaller than the critical distance (CD) at p-value $0.05$ are grouped by a black bar.
  • Figure 5: Comparing greedy objectives for $\operatorname{max-depth}=4$. The strictly concave objectives (orange) significantly outperform the non-concave objectives (blue).
  • ...and 19 more figures

Theorems & Definitions (6)

  • Claim 1
  • Claim 2
  • Claim 3
  • Claim 4
  • Claim 5
  • Claim 6