Table of Contents
Fetching ...

Communication-Efficient Personalized Distributed Learning with Data and Node Heterogeneity

Zhuojun Tian, Zhaoyang Zhang, Yiwei Li, Mehdi Bennis

TL;DR

The paper addresses data and node heterogeneity in decentralized AIoT by introducing the Distributed Strong Lottery Ticket Hypothesis (DSLTH) and a communication-efficient personalized learning framework. Each local model is represented as $\mathbf{v}_i = \mathbf{w} \odot \mathbf{m}_i$ with a fixed global $\mathbf{w}$ and a personalized binary mask $\mathbf{m}_i$, while structured sparsity is promoted via group sparsity regularization. A novel aggregation mechanism uses an intermediate aggregation tensor and a personalized fine-tuning step (MCE-PL) to fuse neighbor information without sacrificing device heterogeneity, and a theoretical DSLTH proof under non-i.i.d. conditions is provided. Empirical results on CIFAR-10 demonstrate improved convergence, personalization, and reduced communication cost across diverse topologies and heterogeneity settings, highlighting practical impact for scalable AIoT deployments.

Abstract

To jointly tackle the challenges of data and node heterogeneity in decentralized learning, we propose a distributed strong lottery ticket hypothesis (DSLTH), based on which a communication-efficient personalized learning algorithm is developed. In the proposed method, each local model is represented as the Hadamard product of global real-valued parameters and a personalized binary mask for pruning. The local model is learned by updating and fusing the personalized binary masks while the real-valued parameters are fixed among different agents. To further reduce the complexity of hardware implementation, we incorporate a group sparse regularization term in the loss function, enabling the learned local model to achieve structured sparsity. Then, a binary mask aggregation algorithm is designed by introducing an intermediate aggregation tensor and adding a personalized fine-tuning step in each iteration, which constrains model updates towards the local data distribution. The proposed method effectively leverages the relativity among agents while meeting personalized requirements in heterogeneous node conditions. We also provide a theoretical proof for the DSLTH, establishing it as the foundation of the proposed method. Numerical simulations confirm the validity of the DSLTH and demonstrate the effectiveness of the proposed algorithm.

Communication-Efficient Personalized Distributed Learning with Data and Node Heterogeneity

TL;DR

The paper addresses data and node heterogeneity in decentralized AIoT by introducing the Distributed Strong Lottery Ticket Hypothesis (DSLTH) and a communication-efficient personalized learning framework. Each local model is represented as with a fixed global and a personalized binary mask , while structured sparsity is promoted via group sparsity regularization. A novel aggregation mechanism uses an intermediate aggregation tensor and a personalized fine-tuning step (MCE-PL) to fuse neighbor information without sacrificing device heterogeneity, and a theoretical DSLTH proof under non-i.i.d. conditions is provided. Empirical results on CIFAR-10 demonstrate improved convergence, personalization, and reduced communication cost across diverse topologies and heterogeneity settings, highlighting practical impact for scalable AIoT deployments.

Abstract

To jointly tackle the challenges of data and node heterogeneity in decentralized learning, we propose a distributed strong lottery ticket hypothesis (DSLTH), based on which a communication-efficient personalized learning algorithm is developed. In the proposed method, each local model is represented as the Hadamard product of global real-valued parameters and a personalized binary mask for pruning. The local model is learned by updating and fusing the personalized binary masks while the real-valued parameters are fixed among different agents. To further reduce the complexity of hardware implementation, we incorporate a group sparse regularization term in the loss function, enabling the learned local model to achieve structured sparsity. Then, a binary mask aggregation algorithm is designed by introducing an intermediate aggregation tensor and adding a personalized fine-tuning step in each iteration, which constrains model updates towards the local data distribution. The proposed method effectively leverages the relativity among agents while meeting personalized requirements in heterogeneous node conditions. We also provide a theoretical proof for the DSLTH, establishing it as the foundation of the proposed method. Numerical simulations confirm the validity of the DSLTH and demonstrate the effectiveness of the proposed algorithm.

Paper Structure

This paper contains 19 sections, 2 theorems, 34 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

(Theorem 1 in da2022proving for SLTH) Consider a convolutional network with $L$ layers. $\varepsilon$ and $C$ are two positive constant. Let $O_l, I_l\in\mathbb{N}$ with $I_l\ge CO_l\log\frac{O_{l-1}O_ld_l^2L}{\min\{\varepsilon, \delta\}}$. Define $\bm{w}_{2l-1}\in\mathbb{R}^{I_l\times O_{l-1}\times Its pruned version is defined as: Then we can choose constant $C$ independently from other paramet

Figures (9)

  • Figure 1: The decentralized communication network of agents with heterogeneous data and system capabilities.
  • Figure 2: Illustration of the lottery ticket hypothesis, strong lottery ticket hypothesis and distributed strong lottery ticket hypothesis: In (a), (c) and (d), the weight parameters are of initialized values, while the circles of darker color in (b) indicate trained weight parameters.
  • Figure 3: The algorithm flowchart of the proposed MCE-PL in the $k$-th iteration for node $i$: The updated and fused binary mask tensor $\bm{m}_i^{(k)}$ is obtained based on the aggregation tensor $\bm{y}_i^{(k)}$. $\bm{y}_i^{(k)}$ combines the information from $\bm{z}_i^{(k)}$ and the binary information $\bm{m}_j^{(k-1/2)}$ received from neighboring nodes $j\in\mathcal{N}_i$, where $\bm{z}_i^{(k)}$ is updated through back propagation and personalized fine-tuning.
  • Figure 4: The verification of the DSLTH: comparison of convergence curves between the weight-based update and the mask-based one considering $r_i=0.1, 0.3, 0.5$ in different agents.
  • Figure 5: The comparison of communication cost.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Remark 1
  • Remark 2
  • Lemma 1
  • Theorem 1