Table of Contents
Fetching ...

Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset

Anxiao Song, Shujie Cui, Jianli Bai, Ke Cheng, Yulong Shen, Giovanni Russello

TL;DR

Guard-GBDT tackles privacy-preserving vertical GBDT training by replacing division and sigmoid-heavy MPC operations with lookup-table approximations and a division-free gain, paired with a communication-efficient gradient aggregation protocol. It leverages Additive Secret Sharing and Function Secret Sharing to achieve secure two-party computation, while offline preprocessing minimizes online costs. Empirical results show Guard-GBDT outperforms state-of-the-art baselines in LAN and WAN settings and achieves competitive accuracy with plaintext XGBoost, demonstrating practical viability for cross-organization collaboration. The framework is extensible to more parties and includes both secure training and prediction pipelines with a publicly released implementation.

Abstract

In light of increasing privacy concerns and stringent legal regulations, using secure multiparty computation (MPC) to enable collaborative GBDT model training among multiple data owners has garnered significant attention. Despite this, existing MPC-based GBDT frameworks face efficiency challenges due to high communication costs and the computation burden of non-linear operations, such as division and sigmoid calculations. In this work, we introduce Guard-GBDT, an innovative framework tailored for efficient and privacy-preserving GBDT training on vertical datasets. Guard-GBDT bypasses MPC-unfriendly division and sigmoid functions by using more streamlined approximations and reduces communication overhead by compressing the messages exchanged during gradient aggregation. We implement a prototype of Guard-GBDT and extensively evaluate its performance and accuracy on various real-world datasets. The results show that Guard-GBDT outperforms state-of-the-art HEP-XGB (CIKM'21) and SiGBDT (ASIA CCS'24) by up to $2.71\times$ and $12.21 \times$ on LAN network and up to $2.7\times$ and $8.2\times$ on WAN network. Guard-GBDT also achieves comparable accuracy with SiGBDT and plaintext XGBoost (better than HEP-XGB ), which exhibits a deviation of $\pm1\%$ to $\pm2\%$ only. Our implementation code is provided at https://github.com/XidianNSS/Guard-GBDT.git.

Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset

TL;DR

Guard-GBDT tackles privacy-preserving vertical GBDT training by replacing division and sigmoid-heavy MPC operations with lookup-table approximations and a division-free gain, paired with a communication-efficient gradient aggregation protocol. It leverages Additive Secret Sharing and Function Secret Sharing to achieve secure two-party computation, while offline preprocessing minimizes online costs. Empirical results show Guard-GBDT outperforms state-of-the-art baselines in LAN and WAN settings and achieves competitive accuracy with plaintext XGBoost, demonstrating practical viability for cross-organization collaboration. The framework is extensible to more parties and includes both secure training and prediction pipelines with a publicly released implementation.

Abstract

In light of increasing privacy concerns and stringent legal regulations, using secure multiparty computation (MPC) to enable collaborative GBDT model training among multiple data owners has garnered significant attention. Despite this, existing MPC-based GBDT frameworks face efficiency challenges due to high communication costs and the computation burden of non-linear operations, such as division and sigmoid calculations. In this work, we introduce Guard-GBDT, an innovative framework tailored for efficient and privacy-preserving GBDT training on vertical datasets. Guard-GBDT bypasses MPC-unfriendly division and sigmoid functions by using more streamlined approximations and reduces communication overhead by compressing the messages exchanged during gradient aggregation. We implement a prototype of Guard-GBDT and extensively evaluate its performance and accuracy on various real-world datasets. The results show that Guard-GBDT outperforms state-of-the-art HEP-XGB (CIKM'21) and SiGBDT (ASIA CCS'24) by up to and on LAN network and up to and on WAN network. Guard-GBDT also achieves comparable accuracy with SiGBDT and plaintext XGBoost (better than HEP-XGB ), which exhibits a deviation of to only. Our implementation code is provided at https://github.com/XidianNSS/Guard-GBDT.git.

Paper Structure

This paper contains 27 sections, 3 theorems, 22 equations, 8 figures, 5 tables, 5 algorithms.

Key Result

Theorem 1

If the arithmetic operations of ASS and DCF of FSS are secure against semi-honest adversaries, then our $\Pi_{\text{LUT}_{\delta}^{n}}$ protocol and $\Pi_{\text{LUT}_{w}^{n}}$ protocol are secure under the semi-honest adversaries model.

Figures (8)

  • Figure 1: Fixed-point encoding for the first-order gradients. In plaintext, $g_i \in \textbf{g}$ is in the range $[-1,1]$. In MPC, $g_i$ is encoded into a fixed-point number. The fraction and sign of $g_i$ carry significant data. The integer part of $g_i$ is filled with $0$'s for positive values or $1$'s for negative values, resulting in redundant information.
  • Figure 2: Our $sigmoid$ function with multiple segments
  • Figure 3: Tree accuracy with different segment approximations.
  • Figure 4: Tree accuracy with different segment approximations, where $T=10$.
  • Figure 5: Microbenchmarks with input size of $10^5$
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 1: Syntax of FSSboyle2016function
  • Definition 2: Semi-honest Security canetti2000securitylindell2017simulate
  • Definition 3: Correctness and Security of FSSboyle2016function
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof