Table of Contents
Fetching ...

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

Tatsuya Akutsu, Avraham A. Melkman, Atsuhiro Takasu

TL;DR

This paper analyzes the prediction phase of binary random forests and seeks compact representations of a bag of decision trees with fewer trees. It proves that the majority function on $n=2m-1$ binary variables can be implemented by a reduced bag of $2m-1-2c$ trees, with each tree’s size polynomial in $n$ for fixed constant $c$. Extending beyond exact representations, it shows that a small-classification-error reduction is possible for a general bag, maintaining polynomial-size trees when $c$ is constant. The work also provides a constructive approach for $k$-out-of-$n$ functions and discusses a prefix-based method to achieve the main majority-function results. Open questions remain about achieving no-error reductions with optimal bounds and about improving the gap between upper and lower bounds when $c$ is not constant.

Abstract

In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of $n$ variables can be represented by a bag of $T$ ($< n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$ and $T$ must be odd (in order to avoid the tie break). We also show that a bag of $n$ decision trees can be represented by a bag of $T$ decision trees each with polynomial size if $n-T$ is a constant and a small classification error is allowed. A related result on the $k$-out-of-$n$ functions is presented too.

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

TL;DR

This paper analyzes the prediction phase of binary random forests and seeks compact representations of a bag of decision trees with fewer trees. It proves that the majority function on binary variables can be implemented by a reduced bag of trees, with each tree’s size polynomial in for fixed constant . Extending beyond exact representations, it shows that a small-classification-error reduction is possible for a general bag, maintaining polynomial-size trees when is constant. The work also provides a constructive approach for -out-of- functions and discusses a prefix-based method to achieve the main majority-function results. Open questions remain about achieving no-error reductions with optimal bounds and about improving the gap between upper and lower bounds when is not constant.

Abstract

In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of variables can be represented by a bag of () decision trees each with polynomial size if is a constant, where and must be odd (in order to avoid the tie break). We also show that a bag of decision trees can be represented by a bag of decision trees each with polynomial size if is a constant and a small classification error is allowed. A related result on the -out-of- functions is presented too.
Paper Structure (8 sections, 6 theorems, 21 equations, 1 figure, 1 table)

This paper contains 8 sections, 6 theorems, 21 equations, 1 figure, 1 table.

Key Result

Theorem 3

For fixed $k$, $C_n(k;\textbf{x})$ can be represented by a bag of decision trees each of which has size $O(n^{|m-k|+1})$.

Figures (1)

  • Figure 1: (A) Decision tree representing the majority function on 3 variables, and (B) Random forest (bag of trees) representing the majority function on 5 variables.

Theorems & Definitions (18)

  • Definition 1
  • Definition 2
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Proposition 5
  • proof
  • Remark 6
  • Example 7
  • ...and 8 more