Scaling Federated Learning Solutions with Kubernetes for Synthesizing Histopathology Images
Andrei-Alexandru Preda, Iulian-Marius Tăiatu, Dumitru-Clementin Cercel
TL;DR
The paper tackles data scarcity and privacy in histopathology by combining GAN-generated synthetic images with Vision Transformer classifiers and validating a production-grade federated framework implemented on Kubernetes. It systematically compares GAN variants (CGAN, WGAN-GP, ACGAN) and discriminator architectures, showing that transformer-based discriminators can yield more realistic samples and better downstream performance. The study demonstrates that federated training with non-IID data can replicate centralized results and that a Kubernetes-based deployment is feasible for multi-institution collaboration, while also examining memorization, mode collapse, and class-similarity considerations. These findings suggest practical, privacy-preserving augmentation workflows for medical imaging and point to future work on multi-class tasks and diffusion models.
Abstract
In the field of deep learning, large architectures often obtain the best performance for many tasks, but also require massive datasets. In the histological domain, tissue images are expensive to obtain and constitute sensitive medical information, raising concerns about data scarcity and privacy. Vision Transformers are state-of-the-art computer vision models that have proven helpful in many tasks, including image classification. In this work, we combine vision Transformers with generative adversarial networks to generate histopathological images related to colorectal cancer and test their quality by augmenting a training dataset, leading to improved classification accuracy. Then, we replicate this performance using the federated learning technique and a realistic Kubernetes setup with multiple nodes, simulating a scenario where the training dataset is split among several hospitals unable to share their information directly due to privacy concerns.
