Table of Contents
Fetching ...

SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Tao Fan, Weijing Chen, Guoqiang Ma, Yan Kang, Lixin Fan, Qiang Yang

TL;DR

The paper tackles the privacy-preserving training of gradient boosting decision trees under vertical federated learning at large scale. It introduces SecureBoost+, combining ciphertext operation optimizations (GH packing, compression, histogram subtraction), novel training mechanisms (Mix Tree, Layered Tree), and a multi-output MO approach to accelerate multi-classification, all within the semi-honest model. Empirical results show SecureBoost+ achieving 6–35x faster training than SecureBoost while preserving accuracy, and scalability to tens of millions of samples with thousands of features, highlighting practical applicability in large enterprises. These contributions reduce cryptographic and communication overhead, enabling efficient privacy-preserving GBDT in real-world, large-scale vertical FL deployments.

Abstract

Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, in order to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples easily. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization, the introduction of new training mechanisms, and multi-classification training optimization. The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.

SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

TL;DR

The paper tackles the privacy-preserving training of gradient boosting decision trees under vertical federated learning at large scale. It introduces SecureBoost+, combining ciphertext operation optimizations (GH packing, compression, histogram subtraction), novel training mechanisms (Mix Tree, Layered Tree), and a multi-output MO approach to accelerate multi-classification, all within the semi-honest model. Empirical results show SecureBoost+ achieving 6–35x faster training than SecureBoost while preserving accuracy, and scalability to tens of millions of samples with thousands of features, highlighting practical applicability in large enterprises. These contributions reduce cryptographic and communication overhead, enabling efficient privacy-preserving GBDT in real-world, large-scale vertical FL deployments.

Abstract

Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, in order to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples easily. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization, the introduction of new training mechanisms, and multi-classification training optimization. The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.

Paper Structure

This paper contains 32 sections, 19 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: The Process of GH Packing
  • Figure 2: The Process of Ciphertext Compression.
  • Figure 3: The Mix Tree Mode.
  • Figure 4: The Layered Tree Mode.
  • Figure 5: Traditional Trees and Multi-Output Trees.
  • ...and 4 more figures