Table of Contents
Fetching ...

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

Catalin E. Brita, Jacobus G. M. van der Linden, Emir Demirović

TL;DR

A novel algorithm is proposed that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound to improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy over greedy heuristics.

Abstract

Computing an optimal classification tree that provably maximizes training performance within a given size limit, is NP-hard, and in practice, most state-of-the-art methods do not scale beyond computing optimal trees of depth three. Therefore, most methods rely on a coarse binarization of continuous features to maintain scalability. We propose a novel algorithm that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound. We develop new pruning techniques that eliminate many sub-optimal splits in the search when similar to previously computed splits and we provide an efficient subroutine for computing optimal depth-two trees. Our experiments demonstrate that these techniques improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy by 5% over greedy heuristics.

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

TL;DR

A novel algorithm is proposed that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound to improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy over greedy heuristics.

Abstract

Computing an optimal classification tree that provably maximizes training performance within a given size limit, is NP-hard, and in practice, most state-of-the-art methods do not scale beyond computing optimal trees of depth three. Therefore, most methods rely on a coarse binarization of continuous features to maintain scalability. We propose a novel algorithm that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound. We develop new pruning techniques that eliminate many sub-optimal splits in the search when similar to previously computed splits and we provide an efficient subroutine for computing optimal depth-two trees. Our experiments demonstrate that these techniques improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy by 5% over greedy heuristics.
Paper Structure (34 sections, 4 theorems, 12 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 34 sections, 4 theorems, 12 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{UB}$ be the best solution so far or the score needed to obtain a better solution. Let $\theta_{\tau} = \operatorname{Split}(\mathcal{D}, d, f, \tau)$ be the optimal misclassification score for the subtree when branching on $f$ with threshold $\tau$. Then any other threshold $\tau'$ wit

Figures (3)

  • Figure 1: The split points $u$ and $v$ for which the score $\theta_u$ and $\theta_v$ are calculated are yellow. The newly pruned values are shown in red. Green indicates the remaining split points for further search. Blue indicates unaffected values outside of $[i..j]$.
  • Figure 2: The number of $\operatorname{D2Split}$ calls for no pruning, the three pruning techniques, and all three combined.
  • Figure 3: The distance to the optimal solution for ConTree's best depth-four solution over time for three datasets. The optimal solution is typically found significantly earlier than the end of the search (the final cross).

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Corollary 1
  • Example 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof