Table of Contents
Fetching ...

Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization

Guopeng Lin, Weili Han, Wenqiang Ruan, Ruisheng Zhou, Lushan Song, Bingshuai Li, Yunfeng Shao

TL;DR

Ents tackles the high communication and computational costs of privacy-preserving three-party training of decision trees. It introduces two key optimizations: (i) secure radix sort-based protocols that update pre-generated permutations to align with group-wise training and enable linear growth with tree height, and (ii) an efficient share-conversion protocol that moves computations between a small ring and a large ring so bit-length expansion is confined to the necessary operations. Empirical results on eight UCI datasets show Ents achieves 5.5× to 9.3× reductions in communication sizes, 3.9× to 5.3× fewer communication rounds, and 3.5× to 6.7× faster training times, with secure training of a Skin Segmentation dataset (over 245k samples) in WAN taking under three hours. The combination of reduced communication and ring-conversion enhances practicality for industrial privacy-preserving decision-tree training while maintaining accuracy comparable to plaintext baselines.

Abstract

Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees demonstrate communication inefficiency due to the following issues: (1) They suffer from huge communication overhead in securely splitting a dataset with continuous attributes. (2) They suffer from huge communication overhead due to performing almost all the computations on a large ring to accommodate the secure computations for the splitting criterion. In this paper, we are motivated to present an efficient three-party training framework, namely Ents, for decision trees by communication optimization. For the first issue, we present a series of training protocols based on the secure radix sort protocols to efficiently and securely split a dataset with continuous attributes. For the second issue, we propose an efficient share conversion protocol to convert shares between a small ring and a large ring to reduce the communication overhead incurred by performing almost all the computations on a large ring. Experimental results from eight widely used datasets show that Ents outperforms state-of-the-art frameworks by $5.5\times \sim 9.3\times$ in communication sizes and $3.9\times \sim 5.3\times$ in communication rounds. In terms of training time, Ents yields an improvement of $3.5\times \sim 6.7\times$. To demonstrate its practicality, Ents requires less than three hours to securely train a decision tree on a widely used real-world dataset (Skin Segmentation) with more than 245,000 samples in the WAN setting.

Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization

TL;DR

Ents tackles the high communication and computational costs of privacy-preserving three-party training of decision trees. It introduces two key optimizations: (i) secure radix sort-based protocols that update pre-generated permutations to align with group-wise training and enable linear growth with tree height, and (ii) an efficient share-conversion protocol that moves computations between a small ring and a large ring so bit-length expansion is confined to the necessary operations. Empirical results on eight UCI datasets show Ents achieves 5.5× to 9.3× reductions in communication sizes, 3.9× to 5.3× fewer communication rounds, and 3.5× to 6.7× faster training times, with secure training of a Skin Segmentation dataset (over 245k samples) in WAN taking under three hours. The combination of reduced communication and ring-conversion enhances practicality for industrial privacy-preserving decision-tree training while maintaining accuracy comparable to plaintext baselines.

Abstract

Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees demonstrate communication inefficiency due to the following issues: (1) They suffer from huge communication overhead in securely splitting a dataset with continuous attributes. (2) They suffer from huge communication overhead due to performing almost all the computations on a large ring to accommodate the secure computations for the splitting criterion. In this paper, we are motivated to present an efficient three-party training framework, namely Ents, for decision trees by communication optimization. For the first issue, we present a series of training protocols based on the secure radix sort protocols to efficiently and securely split a dataset with continuous attributes. For the second issue, we propose an efficient share conversion protocol to convert shares between a small ring and a large ring to reduce the communication overhead incurred by performing almost all the computations on a large ring. Experimental results from eight widely used datasets show that Ents outperforms state-of-the-art frameworks by in communication sizes and in communication rounds. In terms of training time, Ents yields an improvement of . To demonstrate its practicality, Ents requires less than three hours to securely train a decision tree on a widely used real-world dataset (Skin Segmentation) with more than 245,000 samples in the WAN setting.
Paper Structure (44 sections, 1 theorem, 1 equation, 3 figures, 7 tables, 11 algorithms)

This paper contains 44 sections, 1 theorem, 1 equation, 3 figures, 7 tables, 11 algorithms.

Key Result

Theorem 1

Let $c$ be an integer, satisfying $c < \mathscr{k} < \ell - 1$. Let $d \in [0, 2^{\mathscr{k}+1})$, $d_0$ and $d_1$$\in [0, 2^\mathscr{k})$ satisfying $d_0 + d_1 = d$. Then: $(d_0 \gg c) + (-((-d_1) \gg c)) = \lfloor {d / 2^c} \rfloor + bit$, where $bit = 0 \ or \ 1$.

Figures (3)

  • Figure 1: Examples of group-wise protocols. We use both dashed lines and colors to distinctly delineate the group boundaries.
  • Figure 2: Online training time (seconds), communication sizes (MBs), and communication rounds of Ents, Hamada et al.'s framework-radixsort, Hamada et al.'s framework-convert, and Hamada et al.'s framework. 'Skin Seg' refers to Skin Segmentation.
  • Figure 3: An example to show the key steps to train a decision tree with height more than two. Vectors colored orange indicate that elements are stored in their original positions. Vectors colored green, blue, and gray indicate that elements belonging to the same nodes are stored consecutively. Additionally, we use simulated values for the modified Gini vectors, because the real values contain too many digits.

Theorems & Definitions (1)

  • Theorem 1