Table of Contents
Fetching ...

FederBoost: Private Federated Learning for GBDT

Zhihua Tian, Rui Zhang, Xiaoyang Hou, Lingjuan Lyu, Tianyi Zhang, Jian Liu, Kui Ren

TL;DR

FederBoost tackles privacy-preserving federated learning for gradient boosting decision trees (GBDT) with support for vertically and horizontally partitioned data. The key idea is that GBDT training relies on the relative ordering of samples, not their actual values, enabling a non-cryptographic vertical protocol and a lightweight secure-aggregation-based horizontal protocol augmented by bucketization and differential privacy. The authors implement a full system and demonstrate that FederBoost matches centralized accuracy on three public datasets while achieving 4–5 orders of magnitude speedups over state-of-the-art federated decision-tree frameworks. These results suggest FederBoost offers a practical privacy-preserving solution for industrial deployments requiring efficient GBDT training without centralized data collection.

Abstract

Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this paper, we propose a framework named FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the ordering of the data instead of the values. We fully implement FederBoost and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal FederBoost achieve the same level of accuracy with centralized training where all data are collected in a central server, and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.

FederBoost: Private Federated Learning for GBDT

TL;DR

FederBoost tackles privacy-preserving federated learning for gradient boosting decision trees (GBDT) with support for vertically and horizontally partitioned data. The key idea is that GBDT training relies on the relative ordering of samples, not their actual values, enabling a non-cryptographic vertical protocol and a lightweight secure-aggregation-based horizontal protocol augmented by bucketization and differential privacy. The authors implement a full system and demonstrate that FederBoost matches centralized accuracy on three public datasets while achieving 4–5 orders of magnitude speedups over state-of-the-art federated decision-tree frameworks. These results suggest FederBoost offers a practical privacy-preserving solution for industrial deployments requiring efficient GBDT training without centralized data collection.

Abstract

Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this paper, we propose a framework named FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the ordering of the data instead of the values. We fully implement FederBoost and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal FederBoost achieve the same level of accuracy with centralized training where all data are collected in a central server, and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.

Paper Structure

This paper contains 16 sections, 1 theorem, 5 equations, 3 figures, 3 tables, 4 algorithms.

Key Result

Corollary 1

Our bucketization mechanism satisfies $\varepsilon$-element-level local differential privacy.

Figures (3)

  • Figure 1: Data partitions for federated learning.
  • Figure 2: Utility of $\textsf{FederBoost}$ with different number of buckets (Left) and different levels of DP (Right)
  • Figure 3: Training time of $\textsf{FederBoost}$ with different number of participants w.r.t. LAN (Left) and WAN (Right)

Theorems & Definitions (3)

  • Definition 3.1: $\varepsilon$-Local Element-Level DP
  • Corollary 1
  • proof