Reducing Communication for Split Learning by Randomized Top-k Sparsification

Fei Zheng; Chaochao Chen; Lingjuan Lyu; Binhui Yao

Reducing Communication for Split Learning by Randomized Top-k Sparsification

Fei Zheng, Chaochao Chen, Lingjuan Lyu, Binhui Yao

TL;DR

This work tackles the high communication cost of split learning in vertical federated settings by evaluating several compression strategies and introducing randomized top-$k$ sparsification (RandTopk). RandTopk selects top-$k$ elements with high probability while also sampling non-top-$k$ elements with probability $\alpha$, mitigating local minima and balancing neuron usage to improve convergence and generalization under fixed compression. The authors show, across multiple datasets and models, that RandTopk consistently outperforms size reduction, standard top-$k$, quantization, and L1 regularization at the same compression level, approaching or even matching non-sparse performance in some cases. The results demonstrate RandTopk as a practical method to reduce inter-party communication in split learning while maintaining strong predictive accuracy, with privacy considerations discussed and future work proposed to combine quantization and sparsification further.

Abstract

Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-k sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-k sparsification, we further propose randomized top-k sparsification, to make the model generalize and converge better. This is done by selecting top-k elements with a large probability while also having a small probability to select non-top-k elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-k sparsification achieves a better model performance under the same compression level.

Reducing Communication for Split Learning by Randomized Top-k Sparsification

TL;DR

This work tackles the high communication cost of split learning in vertical federated settings by evaluating several compression strategies and introducing randomized top-

sparsification (RandTopk). RandTopk selects top-

elements with high probability while also sampling non-top-

elements with probability

, mitigating local minima and balancing neuron usage to improve convergence and generalization under fixed compression. The authors show, across multiple datasets and models, that RandTopk consistently outperforms size reduction, standard top-

, quantization, and L1 regularization at the same compression level, approaching or even matching non-sparse performance in some cases. The results demonstrate RandTopk as a practical method to reduce inter-party communication in split learning while maintaining strong predictive accuracy, with privacy considerations discussed and future work proposed to combine quantization and sparsification further.

Abstract

Paper Structure (29 sections, 16 equations, 8 figures, 8 tables)

This paper contains 29 sections, 16 equations, 8 figures, 8 tables.

Introduction
Related Work
Reducing Communication for HFL
Reducing Communication for VFL
Basic Compression for Split Learning
Basic Compression Methods
Compressed Size
Summary
Randomized Top-$k$ Sparsification for SL
Analysis of Top-$k$ and Size Reduction
Larger margin, better generalization.
Top-$k$ has a larger margin due to a larger feature space.
Adding Randomness to Top-$k$
Better convergence.
Better generalization.
...and 14 more sections

Figures (8)

Figure 1: Overview of split learning.
Figure 2: The loss surface, gradient field, and learning trajectory of the toy example.
Figure 3: Convergence speed of different methods under certain compression levels. Compressed rates can be found in \ref{['table:main-result']}. Top row: accuracy vs. #epochs. Bottom row: Accuracy vs. communication (total size of the message transferred, vanilla split learning in one epoch = 1). Methods that are not applicable/fail to converge under the given compression level are omitted.
Figure 4: Training loss and generalization error on CIFAR-100. The compressed size is 2.86%.
Figure 5: Distribution of top-$k$ neurons during inference.
...and 3 more figures

Reducing Communication for Split Learning by Randomized Top-k Sparsification

TL;DR

Abstract

Reducing Communication for Split Learning by Randomized Top-k Sparsification

Authors

TL;DR

Abstract

Table of Contents

Figures (8)