Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

Yongjeong Oh; Jaeho Lee; Christopher G. Brinton; Yo-Seb Jeon

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon

TL;DR

SplitFC tackles the high communication overhead in distributed split learning by introducing dispersion-aware adaptive compression of intermediate vectors. It combines two complementary strategies: adaptive feature-wise dropout, which probabilistically drops less informative feature vectors based on their dispersion, and adaptive feature-wise quantization, which applies a two-stage quantizer to high-range vectors and a mean-value quantizer to the rest, with a closed-form optimization for quantization levels. The framework includes a principled bit-budget allocation via convex (water-filling-like) optimization and a method to select the number of vectors to quantize, M, to balance dimensionality reduction and quantization error. Empirical results on MNIST, CIFAR-100, and CelebA demonstrate substantial reductions in uplink/downlink traffic while preserving or improving accuracy, with the best performance observed when both dropout and quantization are used together, and with an optimal level of dimensionality reduction R.

Abstract

This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-100, and CelebA datasets demonstrate that SplitFC outperforms state-of-the-art SL frameworks by significantly reducing communication overheads while maintaining high accuracy.

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 27 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 20 sections, 1 theorem, 27 equations, 4 figures, 3 tables, 3 algorithms.

Introduction
Prior Works
Contributions
Related Works
System Model
A Typical SL Framework
Key Challenge in SL
Motivation of SplitFC
Adaptive Feature-Wise Dropout Strategy
Basic Compression Process of Feature-Wise Dropout
Design of Dropout Probability
Adaptive Feature-Wise Quantization Strategy
Quantizer Design
Two-stage quantizer
Mean-value quantizer
...and 5 more sections

Key Result

Theorem 1

The optimal solution of the problem $({\bf P})$ is for all $l\in\{0,\ldots,M\}$, where $u_0^\star = \frac{\tilde{a}_0^2B\log 2}{\nu^\star}$, $u_j^\star = \frac{\tilde{a}_j^2\log 2}{2\nu^\star}, j\in\{1,\ldots,M\}$, $v_l^\star = (u_l^\star \sqrt{81 - 12u_l^\star} + 9u_l^\star)^{\frac{1}{3}}, \forall l$, and $\nu^\star$ is the optimal Lagrange multipl

Figures (4)

Figure 2: Illustration of the proposed SL framework with adaptive feature-wise dropout and quantization strategies.
Figure 3: Classification accuracy of SplitFC-based frameworks for the MNIST dataset with different values of $R$.
Figure 4: Classification accuracy of SplitFC for the MNIST dataset with various choices of $R$ when $C_{\rm e,d} = 0.4$ bits/entry.
Figure 5: Classification accuracy of SplitFC with and without the quantization level optimization with $C_{\rm e,d} = 0.2$ bits/entry and $R=8$.

Theorems & Definitions (1)

Theorem 1

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

TL;DR

Abstract

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (1)