Communication-efficient Vertical Federated Learning via Compressed Error Feedback

Pedro Valdeira; João Xavier; Cláudia Soares; Yuejie Chi

Communication-efficient Vertical Federated Learning via Compressed Error Feedback

Pedro Valdeira, João Xavier, Cláudia Soares, Yuejie Chi

TL;DR

This work introduces EF-VFL, a communication-efficient vertical federated learning method that uses error feedback to stabilize compressed updates in split neural networks. By employing contractive compressors with an EF21-style surrogate, EF-VFL achieves an $O(1/T)$ convergence rate for nonconvex objectives under nonvanishing compression error and matches the uncompressed rate with sufficiently large mini-batches; under the PL condition, it attains linear convergence to a small neighborhood. The method also accommodates private labels, broadening applicability, and is shown to outperform prior compressed VFL approaches both in theory and in extensive experiments (MNIST, ModelNet10, CIFAR-100) across various compression regimes. These results demonstrate significant gains in communication efficiency while preserving or enhancing predictive performance, enabling scalable collaboration among feature-partitioned data holders.

Abstract

Communication overhead is a known bottleneck in federated learning (FL). To address this, lossy compression is commonly used on the information communicated between the server and clients during training. In horizontal FL, where each client holds a subset of the samples, such communication-compressed training methods have recently seen significant progress. However, in their vertical FL counterparts, where each client holds a subset of the features, our understanding remains limited. To address this, we propose an error feedback compressed vertical federated learning (EF-VFL) method to train split neural networks. In contrast to previous communication-compressed methods for vertical FL, EF-VFL does not require a vanishing compression error for the gradient norm to converge to zero for smooth nonconvex problems. By leveraging error feedback, our method can achieve a $\mathcal{O}(1/T)$ convergence rate for a sufficiently large batch size, improving over the state-of-the-art $\mathcal{O}(1/\sqrt{T})$ rate under $\mathcal{O}(1/\sqrt{T})$ compression error, and matching the rate of uncompressed methods. Further, when the objective function satisfies the Polyak-Łojasiewicz inequality, our method converges linearly. In addition to improving convergence, our method also supports the use of private labels. Numerical experiments show that EF-VFL significantly improves over the prior art, confirming our theoretical results. The code for this work can be found at https://github.com/Valdeira/EF-VFL.

Communication-efficient Vertical Federated Learning via Compressed Error Feedback

TL;DR

convergence rate for nonconvex objectives under nonvanishing compression error and matches the uncompressed rate with sufficiently large mini-batches; under the PL condition, it attains linear convergence to a small neighborhood. The method also accommodates private labels, broadening applicability, and is shown to outperform prior compressed VFL approaches both in theory and in extensive experiments (MNIST, ModelNet10, CIFAR-100) across various compression regimes. These results demonstrate significant gains in communication efficiency while preserving or enhancing predictive performance, enabling scalable collaboration among feature-partitioned data holders.

Abstract

convergence rate for a sufficiently large batch size, improving over the state-of-the-art

rate under

compression error, and matching the rate of uncompressed methods. Further, when the objective function satisfies the Polyak-Łojasiewicz inequality, our method converges linearly. In addition to improving convergence, our method also supports the use of private labels. Numerical experiments show that EF-VFL significantly improves over the prior art, confirming our theoretical results. The code for this work can be found at https://github.com/Valdeira/EF-VFL.

Paper Structure (36 sections, 4 theorems, 73 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 36 sections, 4 theorems, 73 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Our contributions
Related work
Communication-efficient FL.
Vertical FL.
Preliminaries
Error feedback
Problem setup
Proposed method
Forward pass.
Backward pass.
Mini-batch.
Adapting our method for handling private labels
Convergence guarantees
Nonconvex setting
...and 21 more sections

Key Result

Lemma 1

If $\Phi$ is $L$-smooth eq:lsmooth and $\{\bm{H}_k\}$ have bounded derivatives eq:bounded_embedding, then, for all $t\geq0$,

Figures (6)

Figure 1: An illustration of an iteration of the EF-VFL algorithm. Step (1) concerns the model update and step (2) concerns the surrogate update.
Figure 2: The (relative) training gradient squared norm with respect to epochs and validation accuracy with respect to communication cost for the training of a shallow neural network on MNIST. On the left, CVFL and EF-VFL employ top-$k$ sparsification with a decreasing $k$ across rows. On the right, they employ stochastic quantization with a decreasing number of bits across rows. SVFL is the same throughout.
Figure 3: Train loss with respect to the number of epochs and validation accuracy with respect to the communication cost for the training of an MVCNN on ModelNet10. On the left, CVFL and EF-VFL employ top-$k$ sparsification with a decreasing $k$ across rows. On the right, they employ stochastic quantization with a decreasing number of bits across rows. SVFL is the same throughout.
Figure 4: Train loss with respect to the number of epochs and the validation accuracy with respect to the communication cost for the training of a ResNet18-based model on CIFAR-100. On the left, CVFL and EF-VFL employ top-$k$ sparsification with a decreasing $k$ across rows. On the right, they employ stochastic quantization with a decreasing number of bits across rows. SVFL is the same throughout.
Figure 5: Train loss with respect to the number of epochs and validation accuracy with respect to the communication cost for the training of a shallow neural network on MNIST and a ResNet18-based model on CIFAR-100. In the legend, PL stands for private labels. The communication compressed methods---CVFL, EF-VFL, CVFL (PL), and EF-VFL (PL)---employ top-$k$ sparsification.
...and 1 more figures

Theorems & Definitions (9)

Definition 1: Contractive compressor
Lemma 1: Surrogate offset bound
proof
Lemma 2: Recursive distortion bound
proof
Theorem 1
proof
Theorem 2
proof

Communication-efficient Vertical Federated Learning via Compressed Error Feedback

TL;DR

Abstract

Communication-efficient Vertical Federated Learning via Compressed Error Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)