Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients

Yan Li; Xiao Zhang; Mingyi Li; Guangwei Xu; Feng Chen; Yuan Yuan; Yifei Zou; Mengying Zhao; Jianbo Lu; Dongxiao Yu

Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients

Yan Li, Xiao Zhang, Mingyi Li, Guangwei Xu, Feng Chen, Yuan Yuan, Yifei Zou, Mengying Zhao, Jianbo Lu, Dongxiao Yu

TL;DR

This paper tackles the challenge of training very large models across many resource-limited devices with heterogeneous data. It introduces Co-S^2P, a semi-asynchronous collaborative framework that combines data distribution-aware structured pruning, cross-block knowledge transfer via self-distillation, and a semi-asynchronous aggregation strategy to mitigate stragglers. The authors provide a convergence analysis showing a rate of $O(1/\sqrt{N^{*}EQ})$ under standard assumptions and demonstrate substantial practical gains on real IoT hardware, including up to 8.8% improvements in server accuracy and reductions in memory and training time. The approach generalizes across vision and NLP tasks and scales to large models, indicating strong potential for resource-constrained, distributed training environments.

Abstract

In this work, we study to release the potential of massive heterogeneous weak computing power to collaboratively train large-scale models on dispersed datasets. In order to improve both efficiency and accuracy in resource-adaptive collaborative learning, we take the first step to consider the \textit{unstructured pruning}, \textit{varying submodel architectures}, \textit{knowledge loss}, and \textit{straggler} challenges simultaneously. We propose a novel semi-asynchronous collaborative training framework, namely ${Co\text{-}S}^2{P}$, with data distribution-aware structured pruning and cross-block knowledge transfer mechanism to address the above concerns. Furthermore, we provide theoretical proof that ${Co\text{-}S}^2{P}$ can achieve asymptotic optimal convergence rate of $O(1/\sqrt{N^*EQ})$. Finally, we conduct extensive experiments on two types of tasks with a real-world hardware testbed including diverse IoT devices.The experimental results demonstrate that $Co\text{-}S^2P$ improves accuracy by up to 8.8\% and resource utilization by up to 1.2$\times$ compared to state-of-the-art methods, while reducing memory consumption by approximately 22\% and training time by about 24\% on all resource-limited devices.

Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients

TL;DR

under standard assumptions and demonstrate substantial practical gains on real IoT hardware, including up to 8.8% improvements in server accuracy and reductions in memory and training time. The approach generalizes across vision and NLP tasks and scales to large models, indicating strong potential for resource-constrained, distributed training environments.

Abstract

, with data distribution-aware structured pruning and cross-block knowledge transfer mechanism to address the above concerns. Furthermore, we provide theoretical proof that

can achieve asymptotic optimal convergence rate of

. Finally, we conduct extensive experiments on two types of tasks with a real-world hardware testbed including diverse IoT devices.The experimental results demonstrate that

improves accuracy by up to 8.8\% and resource utilization by up to 1.2

compared to state-of-the-art methods, while reducing memory consumption by approximately 22\% and training time by about 24\% on all resource-limited devices.

Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients

TL;DR

Abstract

Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (9)