Table of Contents
Fetching ...

TREE: Tree Regularization for Efficient Execution

Lena Schmid, Daniel Biebert, Christian Hakert, Kuan-Hsun Chen, Michel Lang, Markus Pauly, Jian-Jia Chen

TL;DR

This work tackles accelerating tree-based inference on resource-constrained devices by introducing Tree Regularization, a training-time penalty that rewards uneven splits in CART to shorten inference paths. By adding a regularization term $\lambda\cdot R$ to the Gini impurity, and tuning $\lambda$ via the expected depth $E[\text{depth}]$, the method achieves substantial execution-time gains (up to ~4×) with only modest accuracy degradation, particularly on binary or large-sample datasets. Extensive experiments on eleven UCI datasets and synthetic data demonstrate the trade-offs and identify data properties that correlate with regularization robustness, while an implementation in scikit-learn facilitates practical adoption. The approach complements cache-aware layout optimizations by preserving frequently accessed paths while reducing model size, offering a practical pathway to hardware-aware efficient execution in tree ensembles.

Abstract

The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time consumption for inference in order to optimally utilize the available resources. Random forests and decision trees are shown to be a suitable model for such a scenario, since they are not only heavily tunable towards the total model size, but also offer a high potential for optimizing their executions according to the underlying memory architecture. In addition to the straightforward strategy of enforcing shorter paths through decision trees and hence reducing the execution time for inference, hardware-aware implementations can optimize the execution time in an orthogonal manner. One particular hardware-aware optimization is to layout the memory of decision trees in such a way, that higher probably paths are less likely to be evicted from system caches. This works particularly well when splits within tree nodes are uneven and have a high probability to visit one of the child nodes. In this paper, we present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees at the cost of a minimal accuracy degradation. Specifically, we regularize the impurity computation of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria and hence offer a high optimization potential for a memory architecture-aware implementation. We show that especially for binary classification data sets and data sets with many samples, this form of regularization can lead to an reduction of up to approximately four times in the execution time with a minimal accuracy degradation.

TREE: Tree Regularization for Efficient Execution

TL;DR

This work tackles accelerating tree-based inference on resource-constrained devices by introducing Tree Regularization, a training-time penalty that rewards uneven splits in CART to shorten inference paths. By adding a regularization term to the Gini impurity, and tuning via the expected depth , the method achieves substantial execution-time gains (up to ~4×) with only modest accuracy degradation, particularly on binary or large-sample datasets. Extensive experiments on eleven UCI datasets and synthetic data demonstrate the trade-offs and identify data properties that correlate with regularization robustness, while an implementation in scikit-learn facilitates practical adoption. The approach complements cache-aware layout optimizations by preserving frequently accessed paths while reducing model size, offering a practical pathway to hardware-aware efficient execution in tree ensembles.

Abstract

The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time consumption for inference in order to optimally utilize the available resources. Random forests and decision trees are shown to be a suitable model for such a scenario, since they are not only heavily tunable towards the total model size, but also offer a high potential for optimizing their executions according to the underlying memory architecture. In addition to the straightforward strategy of enforcing shorter paths through decision trees and hence reducing the execution time for inference, hardware-aware implementations can optimize the execution time in an orthogonal manner. One particular hardware-aware optimization is to layout the memory of decision trees in such a way, that higher probably paths are less likely to be evicted from system caches. This works particularly well when splits within tree nodes are uneven and have a high probability to visit one of the child nodes. In this paper, we present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees at the cost of a minimal accuracy degradation. Specifically, we regularize the impurity computation of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria and hence offer a high optimization potential for a memory architecture-aware implementation. We show that especially for binary classification data sets and data sets with many samples, this form of regularization can lead to an reduction of up to approximately four times in the execution time with a minimal accuracy degradation.
Paper Structure (12 sections, 4 equations, 25 figures, 4 tables)

This paper contains 12 sections, 4 equations, 25 figures, 4 tables.

Figures (25)

  • Figure 1: Impact of regularization on execution time across datasets
  • Figure 2: Normalized balanced accuracy values (with respect to the maximum value within each data set) and relative execution times for all data sets with different tree depths and separated by cache optimization or not.
  • Figure 3: Relative frequency of maximum regularization factors in each data set, focusing on the range of values above 40.
  • Figure 4: Simulation results of the regularization for varying $b$ and $\Delta_\mu$: $b=0.9, \Delta_\mu=8$ (red), $b=0.7, \Delta_\mu=3$ (green) and $b=0.5, \Delta_\mu=1$ (blue). The different shapes of the points indicate whether a cache-optimized version is used (circle) or not (triangle).
  • Figure 5: Evaluation of the balanced accuracy for the UCI datasets
  • ...and 20 more figures