Heterogeneity-Aware Coordination for Federated Learning via Stitching Pre-trained blocks

Shichen Zhan; Yebo Wu; Chunlin Tian; Yan Zhao; Li Li

Heterogeneity-Aware Coordination for Federated Learning via Stitching Pre-trained blocks

Shichen Zhan, Yebo Wu, Chunlin Tian, Yan Zhao, Li Li

TL;DR

This paper tackles resource-constrained federated learning by introducing FedStitch, a training-free paradigm that stitches blocks from multiple pre-trained models to form task-specific networks. It combines a reinforcement-learning–based weighted aggregation to counter non-IID data, an on-the-fly search-space optimizer to reduce block candidates, and a local energy coordinator to meet per-round deadlines with minimized energy use. The approach yields up to 20.93% absolute accuracy gains on non-IID data, up to 8.12x speedups, and substantial memory and energy savings (up to 79.5% memory, 89.41% energy) compared to training-based baselines. FedStitch enables broad participation of resource-constrained devices while maintaining strong generalization across CIFAR10/100 and CINIC10, suggesting a practical path for deployment of FL in real-world edge settings.

Abstract

Federated learning (FL) coordinates multiple devices to collaboratively train a shared model while preserving data privacy. However, large memory footprint and high energy consumption during the training process excludes the low-end devices from contributing to the global model with their own data, which severely deteriorates the model performance in real-world scenarios. In this paper, we propose FedStitch, a hierarchical coordination framework for heterogeneous federated learning with pre-trained blocks. Unlike the traditional approaches that train the global model from scratch, for a new task, FedStitch composes the global model via stitching pre-trained blocks. Specifically, each participating client selects the most suitable block based on their local data from the candidate pool composed of blocks from pre-trained models. The server then aggregates the optimal block for stitching. This process iterates until a new stitched network is generated. Except for the new training paradigm, FedStitch consists of the following three core components: 1) an RL-weighted aggregator, 2) a search space optimizer deployed on the server side, and 3) a local energy optimizer deployed on each participating client. The RL-weighted aggregator helps to select the right block in the non-IID scenario, while the search space optimizer continuously reduces the size of the candidate block pool during stitching. Meanwhile, the local energy optimizer is designed to minimize energy consumption of each client while guaranteeing the overall training progress. The results demonstrate that compared to existing approaches, FedStitch improves the model accuracy up to 20.93%. At the same time, it achieves up to 8.12% speedup, reduces the memory footprint up to 79.5%, and achieves 89.41% energy saving at most during the learning procedure.

Heterogeneity-Aware Coordination for Federated Learning via Stitching Pre-trained blocks

TL;DR

Abstract

Paper Structure (26 sections, 7 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 7 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Related Work
FL on Resource-limited Devices
Pre-trained Neural Network in Federated Learning
FedStitch: Overall Learning Paradigm
FedStitch: Core Components
Overview
RL Weighted Aggregator
Search Space Optimizer
Local Energy Coordinator
Evaluation
Experimental Setup
Models and Block Pool
Datasets
Baselines
...and 11 more sections

Figures (8)

Figure 1: Workflow of Stitched Network Generation.
Figure 2: Motivation experiments for statistical heterogeneity (CIFAR10). 'Number of Classes' represents the clients with different numbers of classes ('10' means the client has data with all classes).
Figure 3: The System Overview of FedStitch.
Figure 4: Efficiency comparison of various schemes with baseline group 1 (a-f) and baseline group 2 (g-l) on CIFAR10, CINIC10, and CIFAR100 datasets in IID/Non-IID scenarios on Jetson TX2. The performance of FedStitch is denoted as $\bigstar$. $alexnet$, $resnet50$, $vgg16$, $densenet$, and $mobilenet$ refer to one of the two fine-tuning methods (FT-Full and FT-Part) that yielded higher accuracy.
Figure 5: Memory consumption per round. Left: group 1; Right: group 2.
...and 3 more figures

Heterogeneity-Aware Coordination for Federated Learning via Stitching Pre-trained blocks

TL;DR

Abstract

Heterogeneity-Aware Coordination for Federated Learning via Stitching Pre-trained blocks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)