Table of Contents
Fetching ...

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

Zehui Zhao, Laith Alzubaidi, Haider A. Alwzwazy, Jinglan Zhang, Yuantong Gu

TL;DR

VGS-ATD introduces a privacy-preserving, distributed learning framework for robust, scalable multi-label medical image classification under data heterogeneity and imbalance. By combining AI-To-Data with Vision Transformers and a one-time backbone aggregation, it enables horizontal, vertical, and hierarchical configurations that support flexible, incremental node addition without full retraining. Across 30 datasets and 80 classes, VGS-ATD consistently outperforms centralized, federated, and swarm baselines in accuracy while delivering substantial reductions in training time and computational cost, and shows resilience to catastrophic forgetting through hierarchical ATD-in-ATD design. The work demonstrates practical potential for real-world, privacy-preserving medical AI systems capable of continuous learning in dynamic clinical environments.

Abstract

In recent years, advanced deep learning architectures have shown strong performance in medical imaging tasks. However, the traditional centralized learning paradigm poses serious privacy risks as all data is collected and trained on a single server. To mitigate this challenge, decentralized approaches such as federated learning and swarm learning have emerged, allowing model training on local nodes while sharing only model weights. While these methods enhance privacy, they struggle with heterogeneous and imbalanced data and suffer from inefficiencies due to frequent communication and the aggregation of weights. More critically, the dynamic and complex nature of clinical environments demands scalable AI systems capable of continuously learning from diverse modalities and multilabels. Yet, both centralized and decentralized models are prone to catastrophic forgetting during system expansion, often requiring full model retraining to incorporate new data. To address these limitations, we propose VGS-ATD, a novel distributed learning framework. To validate VGS-ATD, we evaluate it in experiments spanning 30 datasets and 80 independent labels across distributed nodes, VGS-ATD achieved an overall accuracy of 92.7%, outperforming centralized learning (84.9%) and swarm learning (72.99%), while federated learning failed under these conditions due to high requirements on computational resources. VGS-ATD also demonstrated strong scalability, with only a 1% drop in accuracy on existing nodes after expansion, compared to a 20% drop in centralized learning, highlighting its resilience to catastrophic forgetting. Additionally, it reduced computational costs by up to 50% relative to both centralized and swarm learning, confirming its superior efficiency and scalability.

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

TL;DR

VGS-ATD introduces a privacy-preserving, distributed learning framework for robust, scalable multi-label medical image classification under data heterogeneity and imbalance. By combining AI-To-Data with Vision Transformers and a one-time backbone aggregation, it enables horizontal, vertical, and hierarchical configurations that support flexible, incremental node addition without full retraining. Across 30 datasets and 80 classes, VGS-ATD consistently outperforms centralized, federated, and swarm baselines in accuracy while delivering substantial reductions in training time and computational cost, and shows resilience to catastrophic forgetting through hierarchical ATD-in-ATD design. The work demonstrates practical potential for real-world, privacy-preserving medical AI systems capable of continuous learning in dynamic clinical environments.

Abstract

In recent years, advanced deep learning architectures have shown strong performance in medical imaging tasks. However, the traditional centralized learning paradigm poses serious privacy risks as all data is collected and trained on a single server. To mitigate this challenge, decentralized approaches such as federated learning and swarm learning have emerged, allowing model training on local nodes while sharing only model weights. While these methods enhance privacy, they struggle with heterogeneous and imbalanced data and suffer from inefficiencies due to frequent communication and the aggregation of weights. More critically, the dynamic and complex nature of clinical environments demands scalable AI systems capable of continuously learning from diverse modalities and multilabels. Yet, both centralized and decentralized models are prone to catastrophic forgetting during system expansion, often requiring full model retraining to incorporate new data. To address these limitations, we propose VGS-ATD, a novel distributed learning framework. To validate VGS-ATD, we evaluate it in experiments spanning 30 datasets and 80 independent labels across distributed nodes, VGS-ATD achieved an overall accuracy of 92.7%, outperforming centralized learning (84.9%) and swarm learning (72.99%), while federated learning failed under these conditions due to high requirements on computational resources. VGS-ATD also demonstrated strong scalability, with only a 1% drop in accuracy on existing nodes after expansion, compared to a 20% drop in centralized learning, highlighting its resilience to catastrophic forgetting. Additionally, it reduced computational costs by up to 50% relative to both centralized and swarm learning, confirming its superior efficiency and scalability.

Paper Structure

This paper contains 27 sections, 10 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: a, Sample of cloud-based centralized learning paradigm, each client's data and learned weights are stored in the central server. b, Sample of federated learning paradigm, as only learned weights are shared in the central server. Raw data and computational resources are kept by clients. c, Sample of swarm learning process, with both data and weight kept locally with clients. d, Sample of ATD learning process, the client can train a local model and share learned weights with each other, and keep the raw data locally. Unlike others, ATD operates efficiently at both intra-node and inter-node levels: within each node, data is split into low-resource batches (trained on single or multi-GPU setups) and trained into a local model. These local models are then shared in a fully connected manner, enabling collaboration without central coordination. ATD supports continuous, incremental learning as new data or tasks emerge, while preserving privacy and minimizing costly retraining.
  • Figure 2: A sample of the VGS-ATD workflow. Each client node can train a local model using the ViT architecture and receive the backbone extractor weights from other nodes to aggregate a global extractor, utilizing ensemble learning to build a customized classifier. The pretrained weights can also be transferred through weight exchange, benefiting the local model's training efficiency and improving performance.
  • Figure 3: (a), A sample of the VGS-ATD's horizontal training configuration when two nodes share the same feature space. Mn represents the corresponding model n, and Wn represents the weights transferred from model n. The final aggregated model is called Mag. (b), A sample of the VGS-ATD's vertical training configuration when two nodes do not have overlap in feature space. Mp represents a shared pretrained model, and only the extractor part is transferred and aggregated in this setting. (c), A sample of the VGS-ATD's ATD in ATD configuration. Model aggregation can be performed in multiple level.
  • Figure 4: (a), Summary of medical datasets used in the experiment, they have been categorised based on the region of the human body and the disease type. The specific disease type and the imaging technique used to collect each dataset are also provided. (b), The ratio of grey-scale and colourful medical image samples. (c), The ratio of medical image samples collected using different imaging techniques. (d), The ratio of each dataset's samples and the body region they belong to.
  • Figure 5: P1 training time and computational cost comparison.
  • ...and 3 more figures