Table of Contents
Fetching ...

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity

Juha Harviainen, Frank Sommer, Manuel Sorge, Stefan Szeider

TL;DR

The paper addresses the computational complexity of optimally pruning decision trees via two operations: subtree replacement and subtree raising. It shows that subtree replacement admits polynomial-time solutions and efficient DP-based improvements, while subtree raising is NP-complete and subject to a rich parameterized complexity landscape. The authors identify natural parameter combinations (e.g., budgets $k$ and $t$, feature count $d$, domain size $D$, and path-related thresholds $D_T$ and $d_T$) that separate tractable from intractable cases, providing both XP/FPT algorithms and strong hardness results including W[1]-hardness and ETH-based lower bounds. They complement theory with a proof-of-concept DP implementation and preliminary experiments on benchmark data, revealing that common pruning heuristics often approach optimal tradeoffs between the number of pruned nodes and classification errors. The work advances the theoretical foundations of interpretable ML by clarifying when exact pruning is feasible and how to guide practical pruning strategies.

Abstract

We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size $D$ or number $d$ of features, it can be solved in $D^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity

TL;DR

The paper addresses the computational complexity of optimally pruning decision trees via two operations: subtree replacement and subtree raising. It shows that subtree replacement admits polynomial-time solutions and efficient DP-based improvements, while subtree raising is NP-complete and subject to a rich parameterized complexity landscape. The authors identify natural parameter combinations (e.g., budgets and , feature count , domain size , and path-related thresholds and ) that separate tractable from intractable cases, providing both XP/FPT algorithms and strong hardness results including W[1]-hardness and ETH-based lower bounds. They complement theory with a proof-of-concept DP implementation and preliminary experiments on benchmark data, revealing that common pruning heuristics often approach optimal tradeoffs between the number of pruned nodes and classification errors. The work advances the theoretical foundations of interpretable ML by clarifying when exact pruning is feasible and how to guide practical pruning strategies.

Abstract

We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size or number of features, it can be solved in time, where is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.

Paper Structure

This paper contains 11 sections, 23 theorems, 4 equations, 9 figures, 3 tables.

Key Result

Theorem 3.1

DTRep can be solved in time $\mathcal{O}((n + \min\{k^2, t^2\}) \cdot s)$.

Figures (9)

  • Figure 1: Illustration of the two pruning operations. (a) shows the input tree $T$. (b) shows the result of one subtree replacement operation. (c) shows the result of one subtree raising operation.
  • Figure 2: A Hasse diagram of the single parameter relations and results for DTRais${}_{=}$, DTRais${}_{\geq}$: A parameter $p$ has an edge to a lower parameter $q$ if there is a function $f$ such that after straightforward preprocessing we have $q \leq f(p)$. The corresponding theorems and propositions are given in the boxes; for hardness the reference is in the highest box for which hardness holds, for (FPT or XP) tractability the reference is in the lowest box for which tractability holds. BF is for brute-force algorithm.
  • Figure 3: Overview of our results for DTRais${}_{=}$, DTRais${}_{\geq}$. For each box $q$ is the smallest parameter required to achieve an FPT or XP algorithm, and $p$ is the largest parameter such that W[1]-hardness or paraNP-hardness holds. Also, each parameter combination which is not smaller than $p$ and not larger than $q$ leads to the same classification result. Consequently, for parameters $q$ leading to an FPT-algorithm, all parameters which are not smaller than $q$ also lead to an FPT-algorithm. BF is for brute-force algorithm.
  • Figure 4: A visualization of the reduction from the proof of \ref{['thm-replacement-ensembles-np-h']}. $a)$ shows a $\kappa$-Biclique instance; a $\kappa$-biclique is depicted in brown. $b)$ shows the corresponding classification instance. $c)$ shows both trees $T_P$ and $T_Q$ of the input ensemble $\mathcal{T}$. The brown cuts correspond to cuts which are preserved in the solution ensemble $\mathcal{T}'$.
  • Figure 5: Left: The initial decision tree used for \ref{['thm-rais-w-hard-k', 'thm-rais-w-hard-l', 'thm-rais-w-2-hard-l', 'thm-rais-w-hard-k-zero']}. Right: The initial reasonable tree used for \ref{['thm-rais-w-hard-k', 'thm-rais-w-hard-l', 'thm-rais-w-2-hard-l', 'thm-rais-w-hard-k-zero']}.
  • ...and 4 more figures

Theorems & Definitions (42)

  • Theorem 3.1
  • proof
  • Lemma 3.2
  • proof
  • Corollary 3.3
  • proof
  • Theorem 3.4
  • proof
  • Theorem 4.1
  • proof : Proof of \ref{['thm-rais-xp-d']}
  • ...and 32 more