On Computing Optimal Tree Ensembles

Christian Komusiewicz; Pascal Kunz; Frank Sommer; Manuel Sorge

On Computing Optimal Tree Ensembles

Christian Komusiewicz, Pascal Kunz, Frank Sommer, Manuel Sorge

TL;DR

It is shown that dynamic programming, which has been applied successfully to computing decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time algorithm, where $\ell$ is the number of trees.

Abstract

Random forests and, more generally, (decision\nobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. Mainly, we provide two novel algorithms and corresponding lower bounds. First, we are able to carry over and substantially improve on tractability results for decision trees: We obtain an algorithm that, given a training-data set and an size bound $S \in \mathbb{R}$, computes a tree ensemble of size at most $S$ that classifies the data correctly. The algorithm runs in $(4δD S)^S \cdot poly$-time, where $D$ the largest domain size, $δ$ is the largest number of features in which two examples differ, $n$ the number of input examples, and $poly$ a polynomial of the input size. For decision trees, that is, ensembles of size 1, we obtain a running time of $(δD s)^s \cdot poly$, where $s$ is the size of the tree. To obtain these algorithms, we introduce the witness-tree technique, which seems promising for practical implementations. Secondly, we show that dynamic programming, which has been applied successfully to computing decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time algorithm, where $\ell$ is the number of trees. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.

On Computing Optimal Tree Ensembles

TL;DR

It is shown that dynamic programming, which has been applied successfully to computing decision trees, may also be viable for tree ensembles, providing an

-time algorithm, where

is the number of trees.

Abstract

, computes a tree ensemble of size at most

that classifies the data correctly. The algorithm runs in

-time, where

the largest domain size,

is the largest number of features in which two examples differ,

the number of input examples, and

a polynomial of the input size. For decision trees, that is, ensembles of size 1, we obtain a running time of

, where

is the size of the tree. To obtain these algorithms, we introduce the witness-tree technique, which seems promising for practical implementations. Secondly, we show that dynamic programming, which has been applied successfully to computing decision trees, may also be viable for tree ensembles, providing an

-time algorithm, where

is the number of trees. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.

Paper Structure (12 sections, 22 theorems, 8 equations, 7 figures, 1 algorithm)

This paper contains 12 sections, 22 theorems, 8 equations, 7 figures, 1 algorithm.

Introduction
Preliminaries
Decision Trees Versus Tree Ensembles
The Witness-Tree Algorithm
Tight Exponential-Time Algorithm
An Efficient Algorithm for a Small Number of Examples
A Matching Lower Bound
Extensions
Non-Binary Classification
Error Minimization
Enumeration
Outlook

Key Result

Theorem 3.2

Any training data set that can be classified by a decision tree ensemble consisting of $\ell$ trees, each of size at most $s$, can also be classified by a decision tree of size $(s+1)^\ell -1$.

Figures (7)

Figure 1: The tree $T_i$ of $\mathcal{T}$.
Figure 2: Two ways of refining a tree: On the left a new root $r$ and a new leaf $v$ are introduced. On the right, an existing edge between the subtrees $T_1$ and $T_2$ is subdivided with a vertex $u$ and a new leaf $v$ is introduced.
Figure 3: Example for tree $T_1$ (left) and $T_3$ (right) in subcase (1a) in the proof of \ref{['lem:reorder-refinements']}.
Figure 4: Example for tree $T_1$ (left) and $T_3$ (right) in subcase (1b) in the proof of \ref{['lem:reorder-refinements']}.
Figure 5: Tree $T_1$ (left), $T_2$ (middle), and $T_3$ (right) in subcase (2a) in the proof of \ref{['lem:reorder-refinements']}.
...and 2 more figures

Theorems & Definitions (33)

Theorem 3.2
proof
Theorem 3.3
proof
Theorem 4.1
Lemma 4.2
proof
proof : of \ref{['thm:witness-tree-algo']}
Corollary 4.3
Theorem 4.4
...and 23 more

On Computing Optimal Tree Ensembles

TL;DR

Abstract

On Computing Optimal Tree Ensembles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (33)