A deep cut into Split Federated Self-supervised Learning
Marcin Przewięźlikowski, Marcin Osial, Bartosz Zieliński, Marek Śmieja
TL;DR
The paper investigates how the depth at which the neural network is split between clients and a central server affects privacy, communication overhead, and learning performance in federated self-supervised learning. It analyzes the shortcomings of the MoCo-based Split Federated Learning (MocoSFL) approach, particularly its sensitivity to deep splits due to misalignment between online and momentum models during synchronization. The authors propose MonAcoSFL, a momentum-aligned variant that synchronizes both online and momentum client models, proving that this alignment preserves the contrastive objective and yields substantial accuracy gains under deeper, more communication-efficient splits. Empirically, MonAcoSFL achieves state-of-the-art performance across ResNet-18 and MobileNetV2 backbones on CIFAR-10/100 with non-IID data, while also enhancing privacy protections and reducing communication overhead, making federated SSL more practical for real-world deployments.
Abstract
Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demonstrate that splitting depth is crucial for maintaining privacy and communication efficiency in distributed training. We also show that MocoSFL suffers from a catastrophic quality deterioration for the minimal communication overhead. As a remedy, we introduce Momentum-Aligned contrastive Split Federated Learning (MonAcoSFL), which aligns online and momentum client models during training procedure. Consequently, we achieve state-of-the-art accuracy while significantly reducing the communication overhead, making MonAcoSFL more practical in real-world scenarios.
