Table of Contents
Fetching ...

FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization

Xu Yang, Jiapeng Zhang, Qifeng Zhang, Zhuo Tang

TL;DR

This work tackles the uplink bottleneck and privacy concerns in cross-device federated learning by introducing FedMPQ, a secure, communication-efficient framework based on multi-codebook product quantization. It leverages public data and previous-round updates to generate multiple codebooks, enhances flexibility with a residual error pruning mechanism, and performs aggregation in TEEs/TPPs to prevent gradient leakage. The approach achieves substantial uplink compression with minimal accuracy loss and shows improved robustness to non-IID data on LEAF benchmarks such as CelebA and Femnist. Overall, FedMPQ offers a practical, scalable solution for secure, bandwidth-limited FL in real-world wireless networks.

Abstract

In federated learning, particularly in cross-device scenarios, secure aggregation has recently gained popularity as it effectively defends against inference attacks by malicious aggregators. However, secure aggregation often requires additional communication overhead and can impede the convergence rate of the global model, which is particularly challenging in wireless network environments with extremely limited bandwidth. Therefore, achieving efficient communication compression under the premise of secure aggregation presents a highly challenging and valuable problem. In this work, we propose a novel uplink communication compression method for federated learning, named FedMPQ, which is based on multi shared codebook product quantization.Specifically, we utilize updates from the previous round to generate sufficiently robust codebooks. Secure aggregation is then achieved through trusted execution environments (TEE) or a trusted third party (TTP).In contrast to previous works, our approach exhibits greater robustness in scenarios where data is not independently and identically distributed (non-IID) and there is a lack of sufficient public data. The experiments conducted on the LEAF dataset demonstrate that our proposed method achieves 99% of the baseline's final accuracy, while reducing uplink communications by 90-95%

FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization

TL;DR

This work tackles the uplink bottleneck and privacy concerns in cross-device federated learning by introducing FedMPQ, a secure, communication-efficient framework based on multi-codebook product quantization. It leverages public data and previous-round updates to generate multiple codebooks, enhances flexibility with a residual error pruning mechanism, and performs aggregation in TEEs/TPPs to prevent gradient leakage. The approach achieves substantial uplink compression with minimal accuracy loss and shows improved robustness to non-IID data on LEAF benchmarks such as CelebA and Femnist. Overall, FedMPQ offers a practical, scalable solution for secure, bandwidth-limited FL in real-world wireless networks.

Abstract

In federated learning, particularly in cross-device scenarios, secure aggregation has recently gained popularity as it effectively defends against inference attacks by malicious aggregators. However, secure aggregation often requires additional communication overhead and can impede the convergence rate of the global model, which is particularly challenging in wireless network environments with extremely limited bandwidth. Therefore, achieving efficient communication compression under the premise of secure aggregation presents a highly challenging and valuable problem. In this work, we propose a novel uplink communication compression method for federated learning, named FedMPQ, which is based on multi shared codebook product quantization.Specifically, we utilize updates from the previous round to generate sufficiently robust codebooks. Secure aggregation is then achieved through trusted execution environments (TEE) or a trusted third party (TTP).In contrast to previous works, our approach exhibits greater robustness in scenarios where data is not independently and identically distributed (non-IID) and there is a lack of sufficient public data. The experiments conducted on the LEAF dataset demonstrate that our proposed method achieves 99% of the baseline's final accuracy, while reducing uplink communications by 90-95%
Paper Structure (26 sections, 8 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 26 sections, 8 equations, 6 figures, 1 table, 3 algorithms.

Figures (6)

  • Figure 1: Gradient leak. The gradient updates from clients are exposed to the server, allowing the server to potentially infer certain features of the training data by analyzing these updates. Where $D$ represents the client-customized reconstruction operation.
  • Figure 2: The training process of FedMPQ Framework. In each training round, $N$ clients participate, and each client uploads residual error, code, and pseudo-centroids. For further details, please refer to section \ref{['Proposed']}. Best view in color.
  • Figure 3: The relationship between model accuracy and communications, with both FedMPQ and PQ employing the same limited public dataset for codebook training, and both uploading 0.1% of residual error. "Uncompressed" denotes the results obtained without any compression.
  • Figure 4: Model Accuracy vs. Total communication cost. $M$ is the number of codebooks, $K$ represents the number of codewords in each codebook, and $D$ stands for the length of each codeword.
  • Figure 5: Effects of residual error upload ratio. On the CelebA dataset, we set $M=8, K=8, D=4$. On the Femnist dataset, $M=4, K=32, D=4$. Where $M,K,D$ has the same meaning as in Section \ref{['sec:pq']}.
  • ...and 1 more figures