FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning
Boyu Fan, Chenrui Wu, Xiang Su, Pan Hui
TL;DR
The paper tackles system heterogeneity in federated learning by allowing heterogeneous architectures across clients and avoiding a single global model. It introduces FedTSA, which clusters clients by resource using kernel density estimation, applies Stage 1 in-cluster weight averaging with pruning, and performs Stage 2 server-side deep mutual learning guided by data generated via diffusion with a prompt pool; the global knowledge is computed as $z_{avg}=\frac{1}{M}\sum_{r=1}^M z_r$, and updates rely on a KL divergence loss against the averaged logits on synthetic data $\mathcal{X}$. Key contributions include a KDE-based resource-aware clustering with pruning rates, a diffusion-generated data pipeline for cross-cluster knowledge transfer, and comprehensive experiments showing FedTSA outperforms baselines on CIFAR-10/100 and Tiny-ImageNet under both IID and non-IID distributions. The work demonstrates practical impact for deploying FL in heterogeneous hardware environments and provides insights on hyperparameters such as prompts, temperature $T$, and synthetic data quantity, while highlighting trade-offs related to diffusion overhead and task focus.
Abstract
Despite extensive research into data heterogeneity in federated learning (FL), system heterogeneity remains a significant yet often overlooked challenge. Traditional FL approaches typically assume homogeneous hardware resources across FL clients, implying that clients can train a global model within a comparable time frame. However, in practical FL systems, clients often have heterogeneous resources, which impacts their training capacity. This discrepancy underscores the importance of exploring model-heterogeneous FL, a paradigm allowing clients to train different models based on their resource capabilities. To address this challenge, we introduce FedTSA, a cluster-based two-stage aggregation method tailored for system heterogeneity in FL. FedTSA begins by clustering clients based on their capabilities, then performs a two-stage aggregation: conventional weight averaging for homogeneous models in Stage 1, and deep mutual learning with a diffusion model for aggregating heterogeneous models in Stage 2. Extensive experiments demonstrate that FedTSA not only outperforms the baselines but also explores various factors influencing model performance, validating FedTSA as a promising approach for model-heterogeneous FL.
