GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

Qifan Wang; Shujie Cui; Lei Zhou; Ye Dong; Jianli Bai; Yun Sing Koh; Giovanni Russello

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

Qifan Wang, Shujie Cui, Lei Zhou, Ye Dong, Jianli Bai, Yun Sing Koh, Giovanni Russello

TL;DR

GTree presents the first GPU-accelerated, privacy-preserving decision-tree training and inference framework built on three-party replicated secret sharing. By encoding the DT and data as arrays and employing oblivious, GPU-friendly protocols (including Oblivious Array Access and layer-wise training), it hides data, tree structure, access patterns, and statistics. Empirical results show substantial speedups over CPU-based baselines (e.g., ≈11×–21× for training on SPECT/Adult) and strong inference performance for shallow trees (depth < 10), with robust security guarantees against semi-honest adversaries and enhanced protection over prior work. The work highlights practical viability of GPU-enabled privacy-preserving DTs and points to future improvements in ORAM-based access, continuous features, and broader data-type support.

Abstract

Decision tree (DT) is a widely used machine learning model due to its versatility, speed, and interpretability. However, for privacy-sensitive applications, outsourcing DT training and inference to cloud platforms raise concerns about data privacy. Researchers have developed privacy-preserving approaches for DT training and inference using cryptographic primitives, such as Secure Multi-Party Computation (MPC). While these approaches have shown progress, they still suffer from heavy computation and communication overheads. Few recent works employ Graphical Processing Units (GPU) to improve the performance of MPC-protected deep learning. This raises a natural question: \textit{can MPC-protected DT training and inference be accelerated by GPU?} We present GTree, the first scheme that uses GPU to accelerate MPC-protected secure DT training and inference. GTree is built across 3 parties who securely and jointly perform each step of DT training and inference with GPU. Each MPC protocol in GTree is designed in a GPU-friendly version. The performance evaluation shows that GTree achieves ${\thicksim}11{\times}$ and ${\thicksim}21{\times}$ improvements in training SPECT and Adult datasets, compared to the prior most efficient CPU-based work. For inference, GTree shows its superior efficiency when the DT has less than 10 levels, which is $126\times$ faster than the prior most efficient work when inferring $10^4$ instances with a tree of 7 levels. GTree also achieves a stronger security guarantee than prior solutions, which only leaks the tree depth and size of data samples while prior solutions also leak the tree structure. With \textit{oblivious array access}, the access pattern on GPU is also protected.

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

TL;DR

Abstract

and

improvements in training SPECT and Adult datasets, compared to the prior most efficient CPU-based work. For inference, GTree shows its superior efficiency when the DT has less than 10 levels, which is

faster than the prior most efficient work when inferring

instances with a tree of 7 levels. GTree also achieves a stronger security guarantee than prior solutions, which only leaks the tree depth and size of data samples while prior solutions also leak the tree structure. With \textit{oblivious array access}, the access pattern on GPU is also protected.

Paper Structure (27 sections, 5 theorems, 1 equation, 5 figures, 6 tables, 6 algorithms)

This paper contains 27 sections, 5 theorems, 1 equation, 5 figures, 6 tables, 6 algorithms.

Introduction
Background
Decision Tree
Secret Sharing
Threat Model of GTree
Data and Model Representation
Data Representation
Model Representation
Construction Details of GTree
Oblivious Array Access
Oblivious DT Training
Oblivious Learning: Data Partition
Oblivious Learning: Statistics Counting
Oblivious Heuristic Computation
Oblivious Node Split
...and 12 more sections

Key Result

Theorem 1

${\prod}_{\rm OAA}$ securely realizes $\mathcal{F}_{\rm OAA}$, in the presence of one semi-honest party in the ($\mathcal{F}_{\rm Mult}$, $\mathcal{F}_{\rm EQ}$, $\mathcal{F}_{\rm SelectShare}$)-hybrid model.

Figures (5)

Figure 1: Representation of an example tree in GTree. The tree is represented with two arrays $\bm{T}$ and $\bm{F}$, where $|\bm{T}|=|\bm{F}|$. $\bm{T}[i]$ is an internal, leaf, or dummy node, corresponding to $\bm{F}[i]=0,1$, or $2$, respectively. $\bm{T}[i]$ is the assigned feature if it is an internal node, e.g., $\bm{T}[0]=s_1$ and $\bm{T}[2]=s_2$. For the dummy and leaf nodes in the last level, $\bm{T}[i]$ is the label of the path, e.g., $\bm{T}[3]=r_0$ and $\bm{T}[5]=r_1$; otherwise, $\bm{T}[i]$ is a random feature, e.g., $\bm{T}[1]=s_0$ (it is indeed a leaf). Although $\bm{T}[3]$ and $\bm{T}[4]$ are dummy nodes, they play the role of a leaf and store the label of their paths since they are in the last level. Such design not only protects tree shape but also allows for training and inference in a highly parallelized manner.
Figure 2: Counter array in GTree, denoted as $\bm{C}$. The array contains 3 rows and $2(d-1)$ columns. $C[0][2k+j]$ in the first row stores the number of data samples containing $v_{k,j}$, where $k\in[0, d-2]$ and $j\in\{0, 1\}$. In the last two rows, $\bm{C}[1][2k+j]$ and $\bm{C}[2][2k+j]$ stores counts of $(v_{k,j}, 0)$ and $(v_{k,j}, 1)$ pairs, respectively. Such design also helps to take better advantage of GPU's parallelism.
Figure 3: Computation (Comp.) and Communication (Comm.) time of training.
Figure 4: Comp. and comm. time of inference. The results are amortized over $10^4$ instances, i.e., infer $10^4$ inferences at once.
Figure 5: Performance of $\mathtt{SGX~only}$ and GTree for training and inference. Note that $y$-axis in Fig. \ref{['fig:sgx_only_gtree:sub3']} is in logarithm scale.

Theorems & Definitions (10)

Theorem 1
proof
Theorem 2
proof
Theorem 3
proof
Theorem 4
proof
Theorem 5
proof

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

TL;DR

Abstract

GTree: GPU-Friendly Privacy-preserving Decision Tree Training and Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (10)