Table of Contents
Fetching ...

FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning

Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh

TL;DR

<3-5 sentence high-level summary> FedPara tackles the communication bottleneck in Federated Learning by re-parameterizing neural network layers with a low-rank Hadamard product, enabling near-full expressiveness with far fewer transmitted parameters. The core idea W = (X1Y1^T) ⊙ (X2Y2^T) achieves substantial parameter and communication reductions, while preserving or even enhancing accuracy in IID and non-IID settings; a personalized variant pFedPara further splits parameters into global and local components for robust non-IID performance. The approach is compatible with existing FL optimizers and can be extended to various architectures, including CNNs and LSTMs, with demonstrated 3.4× to 10× communication savings in experiments. Overall, FedPara/pFedPara offer practical, scalable improvements for edge devices and heterogeneous networks, with realistic implications for energy use and global FL accessibility.

Abstract

In this work, we propose a communication-efficient parameterization, FedPara, for federated learning (FL) to overcome the burdens on frequent model uploads and downloads. Our method re-parameterizes weight parameters of layers using low-rank weights followed by the Hadamard product. Compared to the conventional low-rank parameterization, our FedPara method is not restricted to low-rank constraints, and thereby it has a far larger capacity. This property enables to achieve comparable performance while requiring 3 to 10 times lower communication costs than the model with the original layers, which is not achievable by the traditional low-rank methods. The efficiency of our method can be further improved by combining with other efficient FL optimizers. In addition, we extend our method to a personalized FL application, pFedPara, which separates parameters into global and local ones. We show that pFedPara outperforms competing personalized FL methods with more than three times fewer parameters.

FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning

TL;DR

<3-5 sentence high-level summary> FedPara tackles the communication bottleneck in Federated Learning by re-parameterizing neural network layers with a low-rank Hadamard product, enabling near-full expressiveness with far fewer transmitted parameters. The core idea W = (X1Y1^T) ⊙ (X2Y2^T) achieves substantial parameter and communication reductions, while preserving or even enhancing accuracy in IID and non-IID settings; a personalized variant pFedPara further splits parameters into global and local components for robust non-IID performance. The approach is compatible with existing FL optimizers and can be extended to various architectures, including CNNs and LSTMs, with demonstrated 3.4× to 10× communication savings in experiments. Overall, FedPara/pFedPara offer practical, scalable improvements for edge devices and heterogeneous networks, with realistic implications for energy use and global FL accessibility.

Abstract

In this work, we propose a communication-efficient parameterization, FedPara, for federated learning (FL) to overcome the burdens on frequent model uploads and downloads. Our method re-parameterizes weight parameters of layers using low-rank weights followed by the Hadamard product. Compared to the conventional low-rank parameterization, our FedPara method is not restricted to low-rank constraints, and thereby it has a far larger capacity. This property enables to achieve comparable performance while requiring 3 to 10 times lower communication costs than the model with the original layers, which is not achievable by the traditional low-rank methods. The efficiency of our method can be further improved by combining with other efficient FL optimizers. In addition, we extend our method to a personalized FL application, pFedPara, which separates parameters into global and local ones. We show that pFedPara outperforms competing personalized FL methods with more than three times fewer parameters.

Paper Structure

This paper contains 46 sections, 4 theorems, 9 equations, 8 figures, 12 tables, 2 algorithms.

Key Result

Proposition 1

Let $\mathbf{X}_1 \in \mathbb R^{m \times r_1}, \mathbf{X}_2 \in \mathbb R^{m \times r_2}, \mathbf{Y}_1 \in \mathbb R^{n \times r_1}, \mathbf{Y}_2 \in \mathbb R^{n \times r_2}$, $r_1, r_2 \le \min(m, n)$ and the constructed matrix be $\mathbf{W} :=(\mathbf{X}_1 \mathbf{Y}_1^\top) \odot (\mathbf{X}_2

Figures (8)

  • Figure 1: Illustrations of low-rank matrix parameterization and $\texttt{FedPara}$ with the same number of parameters $2R(m+n)$. (a) Low-rank parameterization is the summation of $2R$ number of rank-$1$ matrices, $\mathbf{W} = \mathbf{X}\mathbf{Y}^\top$, and $\mathrm{rank}(\mathbf{W})\leq2R$. (b) $\texttt{FedPara}$ is the Hadamard product of two low-rank inner matrices, $\mathbf{W}=\mathbf{W}_1 \odot \mathbf{W}_2 = (\mathbf{X}_1 \mathbf{Y}_1^\top) \odot (\mathbf{X}_2 \mathbf{Y}_2^\top)$, and $\mathrm{rank}(\mathbf{W})\leq R^2$.
  • Figure 2: Diagrams of (a) $\texttt{FedPer}$ and (b) $\texttt{pFedPara}$. The global part is transferred to the server and shared across clients, while the local part remains private in each client.
  • Figure 3: (a-f): Accuracy [%] ($y$-axis) vs. communication costs [GBytes] ($x$-axis) of $\texttt{VGG16}_\mathrm{ori.}$ and $\texttt{VGG16}_\texttt{FedPara}$. Broken line and solid line represent $\texttt{VGG16}_\mathrm{ori.}$ and $\texttt{VGG16}_\texttt{FedPara}$, respectively. (g): Size comparison of transferred parameters, which can be expressed as communication costs [GBytes] (left $y$-axis) or energy consumption [MJ] (right $y$-axis), for the same target accuracy. The white bars are the results of $\texttt{VGG16}_\mathrm{ori.}$ and the black bars are the results of $\texttt{VGG16}_\texttt{FedPara}$. The target accuracy is denoted in the parentheses under the $x$-axis of (g).
  • Figure 4: Test accuracy [%] ($y$-axis) vs. parameters ratio [%] ($x$-axis) of $\texttt{VGG16}_\texttt{FedPara}$ at the target rounds. The target rounds follow Table \ref{['table:half_lstm']}. The dotted line represents $\texttt{VGG16}_\mathrm{ori.}$ with no parameter reduction, and the solid line $\texttt{VGG16}_\texttt{FedPara}$ adjusted by $\gamma \in [0.1, 0.9]$ in 0.1 increments.
  • Figure 5: Average test accuracy over ten local models trained by each algorithm. (a) 100% of local training data on FEMNIST are used with the non-IID setting, which mimics enough local data to train and evaluates each local model on their own data. (b) 20% of local training data on FEMNIST are used with the non-IID setting, which mimics insufficient local data to train local models. (c) 100% of local training data on MNIST are used with the highly-skew non-IID setting, where each client has at most two classes. The error bars denote 95% confidence intervals obtained by 5 repeats.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Corollary 1