Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

Catalin E. Brita; Jacobus G. M. van der Linden; Emir Demirović

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

Catalin E. Brita, Jacobus G. M. van der Linden, Emir Demirović

TL;DR

A novel algorithm is proposed that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound to improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy over greedy heuristics.

Abstract

Computing an optimal classification tree that provably maximizes training performance within a given size limit, is NP-hard, and in practice, most state-of-the-art methods do not scale beyond computing optimal trees of depth three. Therefore, most methods rely on a coarse binarization of continuous features to maintain scalability. We propose a novel algorithm that optimizes trees directly on the continuous feature data using dynamic programming with branch-and-bound. We develop new pruning techniques that eliminate many sub-optimal splits in the search when similar to previously computed splits and we provide an efficient subroutine for computing optimal depth-two trees. Our experiments demonstrate that these techniques improve runtime by one or more orders of magnitude over state-of-the-art optimal methods and improve test accuracy by 5% over greedy heuristics.

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

TL;DR

Abstract

Paper Structure (34 sections, 4 theorems, 12 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 34 sections, 4 theorems, 12 equations, 3 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Heuristics
Optimal
General-purpose solvers
Specialized algorithms
Continuous features
Summary
Preliminaries
Notation
Problem definition
Similarity lower bounding
The ConTree Algorithm
Pruning Techniques
Neighborhood pruning (NB)
...and 19 more sections

Key Result

Theorem 1

Let $\mathcal{UB}$ be the best solution so far or the score needed to obtain a better solution. Let $\theta_{\tau} = \operatorname{Split}(\mathcal{D}, d, f, \tau)$ be the optimal misclassification score for the subtree when branching on $f$ with threshold $\tau$. Then any other threshold $\tau'$ wit

Figures (3)

Figure 1: The split points $u$ and $v$ for which the score $\theta_u$ and $\theta_v$ are calculated are yellow. The newly pruned values are shown in red. Green indicates the remaining split points for further search. Blue indicates unaffected values outside of $[i..j]$.
Figure 2: The number of $\operatorname{D2Split}$ calls for no pruning, the three pruning techniques, and all three combined.
Figure 3: The distance to the optimal solution for ConTree's best depth-four solution over time for three datasets. The optimal solution is typically found significantly earlier than the end of the search (the final cross).

Theorems & Definitions (8)

Theorem 1
proof
Corollary 1
Example 1
Theorem 2
proof
Theorem 3
proof

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

TL;DR

Abstract

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)