Table of Contents
Fetching ...

Resource-efficient Layer-wise Federated Self-supervised Learning

Ye Lin Tun, Chu Myaet Thwal, Huy Q. Le, Minh N. H. Nguyen, Eui-Nam Huh, Choong Seon Hong

TL;DR

The paper tackles the resource- bottleneck of federated self-supervised learning on edge devices by introducing LW-FedSSL, a layer-wise training framework that incrementally trains single layers to dramatically cut memory, computation, and communication. It also proposes Prog-FedSSL, a progressive variant that updates all existing layers per stage with different resource-performance trade-offs. Through extensive experiments on ViT-Tiny and ResNet-18 across multiple datasets (CIFAR, TinyImageNet, Caltech-101, MedMNIST), LW-FedSSL achieves substantial resource reductions (up to 3.34× memory, 4.20× GFLOPs, 5.07× communication) with comparable downstream performance; Prog-FedSSL provides further performance gains in many settings. The results demonstrate robustness to data heterogeneity and partial client participation, highlighting the practical potential of layer-wise and progressive training for scalable, privacy-preserving SSL in federated environments.

Abstract

Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computational and communication costs imposed by SSL and FL algorithms. With the deployment of more complex and large-scale models, these challenges are exacerbated. To tackle this, we propose Layer-Wise Federated Self-Supervised Learning (LW-FedSSL), which allows edge devices to incrementally train a small part of the model at a time. Specifically, in LW-FedSSL, training is decomposed into multiple stages, with each stage responsible for only a specific layer of the model. Since only a portion of the model is active for training at any given time, LW-FedSSL significantly reduces computational requirements. Additionally, only the active model portion needs to be exchanged between the FL server and clients, reducing communication overhead. This enables LW-FedSSL to jointly address both computational and communication challenges of FL client devices. It can achieve up to a $3.34 \times$ reduction in memory usage, $4.20 \times$ fewer computational operations (giga floating point operations, GFLOPs), and a $5.07 \times$ lower communication cost while maintaining performance comparable to its end-to-end training counterpart. Furthermore, we explore a progressive training strategy called Progressive Federated Self-Supervised Learning (Prog-FedSSL), which offers a $1.84\times$ reduction in GFLOPs and a $1.67\times$ reduction in communication costs while maintaining the same memory requirements as end-to-end training. Although the resource efficiency of Prog-FedSSL is lower than that of LW-FedSSL, its performance improvements make it a viable candidate for FL environments with more lenient resource constraints.

Resource-efficient Layer-wise Federated Self-supervised Learning

TL;DR

The paper tackles the resource- bottleneck of federated self-supervised learning on edge devices by introducing LW-FedSSL, a layer-wise training framework that incrementally trains single layers to dramatically cut memory, computation, and communication. It also proposes Prog-FedSSL, a progressive variant that updates all existing layers per stage with different resource-performance trade-offs. Through extensive experiments on ViT-Tiny and ResNet-18 across multiple datasets (CIFAR, TinyImageNet, Caltech-101, MedMNIST), LW-FedSSL achieves substantial resource reductions (up to 3.34× memory, 4.20× GFLOPs, 5.07× communication) with comparable downstream performance; Prog-FedSSL provides further performance gains in many settings. The results demonstrate robustness to data heterogeneity and partial client participation, highlighting the practical potential of layer-wise and progressive training for scalable, privacy-preserving SSL in federated environments.

Abstract

Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computational and communication costs imposed by SSL and FL algorithms. With the deployment of more complex and large-scale models, these challenges are exacerbated. To tackle this, we propose Layer-Wise Federated Self-Supervised Learning (LW-FedSSL), which allows edge devices to incrementally train a small part of the model at a time. Specifically, in LW-FedSSL, training is decomposed into multiple stages, with each stage responsible for only a specific layer of the model. Since only a portion of the model is active for training at any given time, LW-FedSSL significantly reduces computational requirements. Additionally, only the active model portion needs to be exchanged between the FL server and clients, reducing communication overhead. This enables LW-FedSSL to jointly address both computational and communication challenges of FL client devices. It can achieve up to a reduction in memory usage, fewer computational operations (giga floating point operations, GFLOPs), and a lower communication cost while maintaining performance comparable to its end-to-end training counterpart. Furthermore, we explore a progressive training strategy called Progressive Federated Self-Supervised Learning (Prog-FedSSL), which offers a reduction in GFLOPs and a reduction in communication costs while maintaining the same memory requirements as end-to-end training. Although the resource efficiency of Prog-FedSSL is lower than that of LW-FedSSL, its performance improvements make it a viable candidate for FL environments with more lenient resource constraints.
Paper Structure (29 sections, 2 equations, 11 figures, 13 tables, 4 algorithms)

This paper contains 29 sections, 2 equations, 11 figures, 13 tables, 4 algorithms.

Figures (11)

  • Figure 1: Four key steps in a single FL communication round. (i) The server distributes a base global model $M$ to all participating clients. (ii) Each client $n \in [1,N]$ trains the model on its local dataset $D^n$ to produce a local model $M^n$. (iii) The local models are then transmitted back to the server. (iv) Finally, the server aggregates the received local models using weighted averaging to update the global model: $M \leftarrow \sum^N_{n=1} w^n M^n$mcmahan2017communication.
  • Figure 2: Self-supervised learning with MoCoV3 9711302. We omit the negative samples for clarity.
  • Figure 3: LW-FedSSL: Local training process across different stages for the $n$-th client. At the beginning of each stage $s \in [1,S]$, a new layer $L_s$ is sequentially added to the previous encoder $F_{(s-1)}$, increasing its depth. During stage $s$, only the corresponding layer $L_s$ is actively updated, while all prior layers (i.e., $L_1$ to $L_{(s-1)}$) are kept frozen.
  • Figure 4: Comparison of FedSSL, LW-FedSSL (ours), and Prog-FedSSL (ours) at stage $s$ using MoCoV3 as the SSL backbone.
  • Figure 5: Computational and communication resources required for a client (a) Memory usage. (b) FLOPSs consumption. (c) Communication cost.
  • ...and 6 more figures