Table of Contents
Fetching ...

Asynch-SGBDT: Asynchronous Parallel Stochastic Gradient Boosting Decision Tree based on Parameters Server

Cheng Daning, Xia Fen, Li Shigang, Zhang Yunquan

TL;DR

Asynch-SGBDT targets the high cost of training GBDT by enabling asynchronous parallel training on a parameter server. It reframes stochastic GBDT as a stochastic optimization problem via a randomized objective $L_{random}$ and proves convergence bounds under asynchrony, influenced by dataset diversity, sampling rate, step size, and tree size. Empirical results on real-sim, Higgs, and E2006-log1p demonstrate up to 14x–20x speedups with 32 workers, significantly surpassing synchronous baselines like LightGBM in high-dimensional sparse settings. The work provides practical guidelines (the asynch-SGBDT requirements) for achieving scalability and shows that asynchronous training can effectively leverage parameter-server architectures for GBDT when conditions on data and model size are favorable.

Abstract

In AI research and industry, machine learning is the most widely used tool. One of the most important machine learning algorithms is Gradient Boosting Decision Tree, i.e. GBDT whose training process needs considerable computational resources and time. To shorten GBDT training time, many works tried to apply GBDT on Parameter Server. However, those GBDT algorithms are synchronous parallel algorithms which fail to make full use of Parameter Server. In this paper, we examine the possibility of using asynchronous parallel methods to train GBDT model and name this algorithm as asynch-SGBDT (asynchronous parallel stochastic gradient boosting decision tree). Our theoretical and experimental results indicate that the scalability of asynch-SGBDT is influenced by the sample diversity of datasets, sampling rate, step length and the setting of GBDT tree. Experimental results also show asynch-SGBDT training process reaches a linear speedup in asynchronous parallel manner when datasets and GBDT trees meet high scalability requirements.

Asynch-SGBDT: Asynchronous Parallel Stochastic Gradient Boosting Decision Tree based on Parameters Server

TL;DR

Asynch-SGBDT targets the high cost of training GBDT by enabling asynchronous parallel training on a parameter server. It reframes stochastic GBDT as a stochastic optimization problem via a randomized objective and proves convergence bounds under asynchrony, influenced by dataset diversity, sampling rate, step size, and tree size. Empirical results on real-sim, Higgs, and E2006-log1p demonstrate up to 14x–20x speedups with 32 workers, significantly surpassing synchronous baselines like LightGBM in high-dimensional sparse settings. The work provides practical guidelines (the asynch-SGBDT requirements) for achieving scalability and shows that asynchronous training can effectively leverage parameter-server architectures for GBDT when conditions on data and model size are favorable.

Abstract

In AI research and industry, machine learning is the most widely used tool. One of the most important machine learning algorithms is Gradient Boosting Decision Tree, i.e. GBDT whose training process needs considerable computational resources and time. To shorten GBDT training time, many works tried to apply GBDT on Parameter Server. However, those GBDT algorithms are synchronous parallel algorithms which fail to make full use of Parameter Server. In this paper, we examine the possibility of using asynchronous parallel methods to train GBDT model and name this algorithm as asynch-SGBDT (asynchronous parallel stochastic gradient boosting decision tree). Our theoretical and experimental results indicate that the scalability of asynch-SGBDT is influenced by the sample diversity of datasets, sampling rate, step length and the setting of GBDT tree. Experimental results also show asynch-SGBDT training process reaches a linear speedup in asynchronous parallel manner when datasets and GBDT trees meet high scalability requirements.

Paper Structure

This paper contains 42 sections, 41 equations, 10 figures, 3 algorithms.

Figures (10)

  • Figure 1: Using SGD to train the stochastic GBDT: The minimums of $L$ and $E[L_{random}]$ are the same. Solving the minimum of $E[L_{random}]$ is a high-performance process.
  • Figure 2: Asynch-SGBDT on the parameter server: worker 1 and worker 2 are asynchronously parallel and independently work. The server updates $L'_{random}$ at once when it receives a tree from any worker.
  • Figure 3: Different GBDT Training Method Patterns
  • Figure 4: The sample diversity in dataset exert influence on the the sparsity of $\mathbf{Q}$
  • Figure 5: Asynch-SGBDT with different number of workers and the same sampling rate using the Higgs dataset
  • ...and 5 more figures

Theorems & Definitions (2)

  • proof
  • proof