subMFL: Compatiple subModel Generation for Federated Learning in Device Heterogenous Environment
Zeyneddin Oz, Ceylan Soygul Oz, Abdollah Malekjafarian, Nima Afraz, Fatemeh Golpayegani
TL;DR
The paper tackles the challenge of federated learning with highly heterogeneous devices by enabling participation of resource-constrained clients. It introduces subMFL, which trains a dense $GM$ on capable devices and then performs server-side, dataless unstructured pruning (via $L1$-norm) to produce a set of sparse submodels $SM$ with thresholds from 0.1 to 0.9, preserving transferred weights. Devices select the densest submodel they can train, improving participation rates (up to ~70%) while incurring modest accuracy losses (e.g., MNIST ~2% at ~50% sparsity, FMNIST ~10%), and the aggregation uses FedAvg. This approach reduces energy and computation on edge devices, avoids per-device capability estimation, and enhances robustness to device heterogeneity, with future work focusing on theoretical guarantees and further compression strategies for reduced communication overhead.
Abstract
Federated Learning (FL) is commonly used in systems with distributed and heterogeneous devices with access to varying amounts of data and diverse computing and storage capacities. FL training process enables such devices to update the weights of a shared model locally using their local data and then a trusted central server combines all of those models to generate a global model. In this way, a global model is generated while the data remains local to devices to preserve privacy. However, training large models such as Deep Neural Networks (DNNs) on resource-constrained devices can take a prohibitively long time and consume a large amount of energy. In the current process, the low-capacity devices are excluded from the training process, although they might have access to unseen data. To overcome this challenge, we propose a model compression approach that enables heterogeneous devices with varying computing capacities to participate in the FL process. In our approach, the server shares a dense model with all devices to train it: Afterwards, the trained model is gradually compressed to obtain submodels with varying levels of sparsity to be used as suitable initial global models for resource-constrained devices that were not capable of train the first dense model. This results in an increased participation rate of resource-constrained devices while the transferred weights from the previous round of training are preserved. Our validation experiments show that despite reaching about 50 per cent global sparsity, generated submodels maintain their accuracy while can be shared to increase participation by around 50 per cent.
