FedSDA: Federated Stain Distribution Alignment for Non-IID Histopathological Image Classification
Cheng-Chang Tsai, Kai-Wen Cheng, Chun-Shien Lu
TL;DR
This work tackles non-IID distribution shifts in federated histopathology by shifting focus from model updates to data distributions. It introduces FedSDA, which aligns stain distributions across clients by decomposing images with stain separation, and training a conditional diffusion model on stain matrices to fit a global target distribution without sharing raw data. The method enables cross-client stain alignment and improves tumor classification performance and image quality across MA14, CAMELYON17, and AGGC22 datasets, outperforming optimization- and normalization-based baselines as well as CCST and amp-norm baselines. The results demonstrate a practical, privacy-preserving data-centric solution for distribution shifts in computational pathology with potential for real-world FL deployments.
Abstract
Federated learning (FL) has shown success in collaboratively training a model among decentralized data resources without directly sharing privacy-sensitive training data. Despite recent advances, non-IID (non-independent and identically distributed) data poses an inevitable challenge that hinders the use of FL. In this work, we address the issue of non-IID histopathological images with feature distribution shifts from an intuitive perspective that has only received limited attention. Specifically, we address this issue from the perspective of data distribution by solely adjusting the data distributions of all clients. Building on the success of diffusion models in fitting data distributions and leveraging stain separation to extract the pivotal features that are closely related to the non-IID properties of histopathological images, we propose a Federated Stain Distribution Alignment (FedSDA) method. FedSDA aligns the stain distribution of each client with a target distribution in an FL framework to mitigate distribution shifts among clients. Furthermore, considering that training diffusion models on raw data in FL has been shown to be susceptible to privacy leakage risks, we circumvent this problem while still effectively achieving alignment. Extensive experimental results show that FedSDA is not only effective in improving baselines that focus on mitigating disparities across clients' model updates but also outperforms baselines that address the non-IID data issues from the perspective of data distribution. We show that FedSDA provides valuable and practical insights for the computational pathology community.
