Table of Contents
Fetching ...

The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest

Shen-Huan Lyu, Jin-Hui Wu, Qin-Cheng Zheng, Baoliu Ye

TL;DR

This work analyzes the expressiveness of deep forests by formalizing approximation complexity and focusing on the roles of depth, width, and tree size. Using generalized parity functions, it proves that depth can be exponentially more powerful than both width and single-tree size, providing constructive upper bounds and matching lower bounds. The authors also demonstrate this depth advantage experimentally on synthetic parity tasks with product distributions and on real-world datasets, illustrating practical implications for model design. Overall, the paper delivers the first theoretical explanation for depth advantages in deep forests and lays groundwork for improved learning efficiency and robustness in tree-based ensembles.

Abstract

Random forests are classical ensemble algorithms that construct multiple randomized decision trees and aggregate their predictions using naive averaging. \citet{zhou2019deep} further propose a deep forest algorithm with multi-layer forests, which outperforms random forests in various tasks. The performance of deep forests is related to three hyperparameters in practice: depth, width, and tree size, but little has been known about its theoretical explanation. This work provides the first upper and lower bounds on the approximation complexity of deep forests concerning the three hyperparameters. Our results confirm the distinctive role of depth, which can exponentially enhance the expressiveness of deep forests compared with width and tree size. Experiments confirm the theoretical findings.

The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest

TL;DR

This work analyzes the expressiveness of deep forests by formalizing approximation complexity and focusing on the roles of depth, width, and tree size. Using generalized parity functions, it proves that depth can be exponentially more powerful than both width and single-tree size, providing constructive upper bounds and matching lower bounds. The authors also demonstrate this depth advantage experimentally on synthetic parity tasks with product distributions and on real-world datasets, illustrating practical implications for model design. Overall, the paper delivers the first theoretical explanation for depth advantages in deep forests and lays groundwork for improved learning efficiency and robustness in tree-based ensembles.

Abstract

Random forests are classical ensemble algorithms that construct multiple randomized decision trees and aggregate their predictions using naive averaging. \citet{zhou2019deep} further propose a deep forest algorithm with multi-layer forests, which outperforms random forests in various tasks. The performance of deep forests is related to three hyperparameters in practice: depth, width, and tree size, but little has been known about its theoretical explanation. This work provides the first upper and lower bounds on the approximation complexity of deep forests concerning the three hyperparameters. Our results confirm the distinctive role of depth, which can exponentially enhance the expressiveness of deep forests compared with width and tree size. Experiments confirm the theoretical findings.
Paper Structure (20 sections, 6 theorems, 35 equations, 9 figures, 3 tables)

This paper contains 20 sections, 6 theorems, 35 equations, 9 figures, 3 tables.

Key Result

Lemma 1

Let $\bm{x} , \bm{y} \in \mathcal{X}$ denote two input vectors, $r>0$ is a positive real number, and $f \in \mathcal{F}_\mathcal{X}$ represents a classification mapping. Then either $C_{r,f}(\bm{x}) = C_{r,f}(\bm{y})$ or $C_{r,f}(\bm{x}) \cap C_{r,f}(\bm{y}) = \varnothing$.

Figures (9)

  • Figure 1: Illustration of the deep forest architecture.
  • Figure 2: A $2$-dimensional demonstration of the construction of the deep tree expressing the parity function. Circles and crosses at integral points are positive and negative classes, respectively. Rectangles indicate tree leaves.
  • Figure 3: A $2$-dimensional demonstration of the relationship between the number of correct points and the number of mistaken points in a tree leaf, where rectangles, circles, and crosses represent tree leaves, correct points, and mistaken points, respectively.
  • Figure 4: A demonstration of expressing a decision tree using a deep tree. Circles, crosses, and triangles represent positive, negative, and unlabeled points, respectively. Rectangles indicate tree leaves.
  • Figure 5: A 2-dimensional demonstration of parity functions with the uniform distribution and the constructed product distribution. Grids with red shadow have negative labels, and grids with blue dots have positive labels. The number in each grid represents the relative magnitude of the probability mass function, and $a=3$ is a constant.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Theorem 2
  • Theorem 3