Table of Contents
Fetching ...

FedSKD: Aggregation-free Model-heterogeneous Federated Learning using Multi-dimensional Similarity Knowledge Distillation

Ziqiao Weng, Weidong Cai, Bo Zhou

TL;DR

FedSKD tackles model-heterogeneous federated learning in privacy-sensitive medical contexts by removing centralized aggregation and enabling round-robin exchange of heterogeneous models. Its core is a bidirectional knowledge transfer framework built on multi-dimensional similarity knowledge distillation, aligning batch-wise, pixel/voxel-wise, and region-wise representations to prevent model drift and knowledge dilution. Across ASD (ABIDE-derived FedASD) and skin lesion (Derm7pt-derived FedSkin) tasks with non-IID partitions, FedSKD consistently outperforms aggregation-based and baseline P2P methods in both client-specific personalization and cross-institution generalization. The results underscore FedSKD’s potential as a scalable, robust solution for realistic medical FL deployments, while highlighting avenues for efficiency, security, and broader task applicability.

Abstract

Federated learning (FL) enables privacy-preserving collaborative model training without direct data sharing. Model-heterogeneous FL (MHFL) extends this paradigm by allowing clients to train personalized models with heterogeneous architectures tailored to their computational resources and application-specific needs. However, existing MHFL methods predominantly rely on centralized aggregation, which introduces scalability and efficiency bottlenecks, or impose restrictions requiring partially identical model architectures across clients. While peer-to-peer (P2P) FL removes server dependence, it suffers from model drift and knowledge dilution, limiting its effectiveness in heterogeneous settings. To address these challenges, we propose FedSKD, a novel MHFL framework that facilitates direct knowledge exchange through round-robin model circulation, eliminating the need for centralized aggregation while allowing fully heterogeneous model architectures across clients. FedSKD's key innovation lies in multi-dimensional similarity knowledge distillation, which enables bidirectional cross-client knowledge transfer at batch, pixel/voxel, and region levels for heterogeneous models in FL. This approach mitigates catastrophic forgetting and model drift through progressive reinforcement and distribution alignment while preserving model heterogeneity. Extensive evaluations on fMRI-based autism spectrum disorder diagnosis and skin lesion classification demonstrate that FedSKD outperforms state-of-the-art heterogeneous and homogeneous FL baselines, achieving superior personalization (client-specific accuracy) and generalization (cross-institutional adaptability). These findings underscore FedSKD's potential as a scalable and robust solution for real-world medical federated learning applications.

FedSKD: Aggregation-free Model-heterogeneous Federated Learning using Multi-dimensional Similarity Knowledge Distillation

TL;DR

FedSKD tackles model-heterogeneous federated learning in privacy-sensitive medical contexts by removing centralized aggregation and enabling round-robin exchange of heterogeneous models. Its core is a bidirectional knowledge transfer framework built on multi-dimensional similarity knowledge distillation, aligning batch-wise, pixel/voxel-wise, and region-wise representations to prevent model drift and knowledge dilution. Across ASD (ABIDE-derived FedASD) and skin lesion (Derm7pt-derived FedSkin) tasks with non-IID partitions, FedSKD consistently outperforms aggregation-based and baseline P2P methods in both client-specific personalization and cross-institution generalization. The results underscore FedSKD’s potential as a scalable, robust solution for realistic medical FL deployments, while highlighting avenues for efficiency, security, and broader task applicability.

Abstract

Federated learning (FL) enables privacy-preserving collaborative model training without direct data sharing. Model-heterogeneous FL (MHFL) extends this paradigm by allowing clients to train personalized models with heterogeneous architectures tailored to their computational resources and application-specific needs. However, existing MHFL methods predominantly rely on centralized aggregation, which introduces scalability and efficiency bottlenecks, or impose restrictions requiring partially identical model architectures across clients. While peer-to-peer (P2P) FL removes server dependence, it suffers from model drift and knowledge dilution, limiting its effectiveness in heterogeneous settings. To address these challenges, we propose FedSKD, a novel MHFL framework that facilitates direct knowledge exchange through round-robin model circulation, eliminating the need for centralized aggregation while allowing fully heterogeneous model architectures across clients. FedSKD's key innovation lies in multi-dimensional similarity knowledge distillation, which enables bidirectional cross-client knowledge transfer at batch, pixel/voxel, and region levels for heterogeneous models in FL. This approach mitigates catastrophic forgetting and model drift through progressive reinforcement and distribution alignment while preserving model heterogeneity. Extensive evaluations on fMRI-based autism spectrum disorder diagnosis and skin lesion classification demonstrate that FedSKD outperforms state-of-the-art heterogeneous and homogeneous FL baselines, achieving superior personalization (client-specific accuracy) and generalization (cross-institutional adaptability). These findings underscore FedSKD's potential as a scalable and robust solution for real-world medical federated learning applications.

Paper Structure

This paper contains 18 sections, 16 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Comparison of traditional MHFL schemes (a) and our proposed MHFL framework (b). Traditional MHFL methods rely on a central server for knowledge fusion and require partially identical model structures across clients. In contrast, our MHFL framework operates without a central server and allows fully heterogeneous model architectures while enabling direct knowledge transfer among clients.
  • Figure 2: (a) Traditional P2P FL methods are prone to model drift and knowledge dilution, limiting their effectiveness in MHFL. (b) Our FedSKD framework enhances knowledge transfer by circulating heterogeneous models among clients in a round-robin manner guided by a predefined order $\mathcal{O}$. During non-transfer phases, clients refine their personalized domain-adaptive models (DAM) locally. In transfer phases, bidirectional knowledge transfer occurs between the local DAM and the received knowledge-transit model (KTM) through multi-dimensional similarity knowledge distillation (SKD). This process allows the DAM to share domain-specific knowledge with the KTM, while the KTM provides cross-client insights to the DAM, ensuring continuous knowledge reinforcement and mitigating model drift through consistent distribution alignment.
  • Figure 3: (a) Overview of the FedSKD training framework: Multi-dimensional similarity knowledge distillation facilitates the mutual distillation of semantically meaningful knowledge between the local DAM and the received KTM (Figure \ref{['fig:fig2']}). Both models are jointly optimized on the DAM's dataset, leveraging supervised learning with corresponding dataset labels. The snowflake icon signifies frozen parameters, whereas the flame icon represents active parameter updates during training. (b) Computation of multi-dimensional similarity knowledge distillation loss: The loss is calculated by measuring divergence between the learned similarity patterns of the DAM and KTM across multiple granularities (Batch-wise, Pixel/Voxel-wise, Region-wise), ensuring effective knowledge transfer and alignment.
  • Figure 4: Distribution of non-IID data across the clients in our FedSkin datasets for skin lesion classification. Different colors represent different clients with unique non-IID distributions.
  • Figure 5: Comparison of AUC Improvements: FedSKD vs. FedCross. (a) Summary of AUC performance gain across 17 institutions from the FedASD dataset (Table \ref{['tab:data_distribution']}), comparing FedSDK and FedCross. (b) Summary of AUC performance gain across 5 skin lesion types from the FedSkin dataset (Figure \ref{['fig:fig4']}), comparing FedSDK and FedCross.