Optimal Decision Tree Pruning Revisited: Algorithms and Complexity
Juha Harviainen, Frank Sommer, Manuel Sorge, Stefan Szeider
TL;DR
The paper addresses the computational complexity of optimally pruning decision trees via two operations: subtree replacement and subtree raising. It shows that subtree replacement admits polynomial-time solutions and efficient DP-based improvements, while subtree raising is NP-complete and subject to a rich parameterized complexity landscape. The authors identify natural parameter combinations (e.g., budgets $k$ and $t$, feature count $d$, domain size $D$, and path-related thresholds $D_T$ and $d_T$) that separate tractable from intractable cases, providing both XP/FPT algorithms and strong hardness results including W[1]-hardness and ETH-based lower bounds. They complement theory with a proof-of-concept DP implementation and preliminary experiments on benchmark data, revealing that common pruning heuristics often approach optimal tradeoffs between the number of pruned nodes and classification errors. The work advances the theoretical foundations of interpretable ML by clarifying when exact pruning is feasible and how to guide practical pruning strategies.
Abstract
We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size $D$ or number $d$ of features, it can be solved in $D^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.
