Table of Contents
Fetching ...

Scaling Federated Learning Solutions with Kubernetes for Synthesizing Histopathology Images

Andrei-Alexandru Preda, Iulian-Marius Tăiatu, Dumitru-Clementin Cercel

TL;DR

The paper tackles data scarcity and privacy in histopathology by combining GAN-generated synthetic images with Vision Transformer classifiers and validating a production-grade federated framework implemented on Kubernetes. It systematically compares GAN variants (CGAN, WGAN-GP, ACGAN) and discriminator architectures, showing that transformer-based discriminators can yield more realistic samples and better downstream performance. The study demonstrates that federated training with non-IID data can replicate centralized results and that a Kubernetes-based deployment is feasible for multi-institution collaboration, while also examining memorization, mode collapse, and class-similarity considerations. These findings suggest practical, privacy-preserving augmentation workflows for medical imaging and point to future work on multi-class tasks and diffusion models.

Abstract

In the field of deep learning, large architectures often obtain the best performance for many tasks, but also require massive datasets. In the histological domain, tissue images are expensive to obtain and constitute sensitive medical information, raising concerns about data scarcity and privacy. Vision Transformers are state-of-the-art computer vision models that have proven helpful in many tasks, including image classification. In this work, we combine vision Transformers with generative adversarial networks to generate histopathological images related to colorectal cancer and test their quality by augmenting a training dataset, leading to improved classification accuracy. Then, we replicate this performance using the federated learning technique and a realistic Kubernetes setup with multiple nodes, simulating a scenario where the training dataset is split among several hospitals unable to share their information directly due to privacy concerns.

Scaling Federated Learning Solutions with Kubernetes for Synthesizing Histopathology Images

TL;DR

The paper tackles data scarcity and privacy in histopathology by combining GAN-generated synthetic images with Vision Transformer classifiers and validating a production-grade federated framework implemented on Kubernetes. It systematically compares GAN variants (CGAN, WGAN-GP, ACGAN) and discriminator architectures, showing that transformer-based discriminators can yield more realistic samples and better downstream performance. The study demonstrates that federated training with non-IID data can replicate centralized results and that a Kubernetes-based deployment is feasible for multi-institution collaboration, while also examining memorization, mode collapse, and class-similarity considerations. These findings suggest practical, privacy-preserving augmentation workflows for medical imaging and point to future work on multi-class tasks and diffusion models.

Abstract

In the field of deep learning, large architectures often obtain the best performance for many tasks, but also require massive datasets. In the histological domain, tissue images are expensive to obtain and constitute sensitive medical information, raising concerns about data scarcity and privacy. Vision Transformers are state-of-the-art computer vision models that have proven helpful in many tasks, including image classification. In this work, we combine vision Transformers with generative adversarial networks to generate histopathological images related to colorectal cancer and test their quality by augmenting a training dataset, leading to improved classification accuracy. Then, we replicate this performance using the federated learning technique and a realistic Kubernetes setup with multiple nodes, simulating a scenario where the training dataset is split among several hospitals unable to share their information directly due to privacy concerns.

Paper Structure

This paper contains 27 sections, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Summary of the proposed approach: testing different types of GANs for histopathology images, using synthetic images to improve classification, and exploring this method in a federated setting for increased data privacy.
  • Figure 2: Images generated by ACGAN with a ViT B 16 discriminator.
  • Figure 3: Comparison of images generated in the centralized vs. federated settings.
  • Figure 4: Examples of colonoscopy images when the non-IID ratio is $0.9$. Note that an epoch here refers to a round of training the server model. Each round iterates over the client's dataset multiple times.
  • Figure 5: Kubernetes cluster architecture used for federated learning experiments.
  • ...and 2 more figures