Stain-aware Domain Alignment for Imbalance Blood Cell Classification
Yongcheng Li, Lingcong Cai, Ying Lu, Xianghua Fu, Xiao Han, Ma Li, Wenxing Lai, Xiangzhong Zhang, Xiaomao Fan
TL;DR
This work tackles domain shift and data imbalance in multi-source blood cell image classification by introducing SADA, a stain-aware domain alignment framework. SADA combines stain-based augmentation to create domain-transformed samples, a local alignment mechanism to enforce pixel-level feature consistency, and domain-invariant supervised contrastive learning to learn discriminative, domain-agnostic representations, all trained in two stages to mitigate imbalance. Empirical results on four public datasets plus SYSU3H and an external private dataset demonstrate state-of-the-art performance and strong cross-domain generalization, with clear ablations validating each component. The approach offers practical potential for reliable hematology imaging analysis in real-world, multi-center settings. Overall, SADA advances domain generalization for imbalanced biomedical image classification by explicitly modeling stain variation and domain-invariant features.
Abstract
Blood cell identification is critical for hematological analysis as it aids physicians in diagnosing various blood-related diseases. In real-world scenarios, blood cell image datasets often present the issues of domain shift and data imbalance, posing challenges for accurate blood cell identification. To address these issues, we propose a novel blood cell classification method termed SADA via stain-aware domain alignment. The primary objective of this work is to mine domain-invariant features in the presence of domain shifts and data imbalances. To accomplish this objective, we propose a stain-based augmentation approach and a local alignment constraint to learn domain-invariant features. Furthermore, we propose a domain-invariant supervised contrastive learning strategy to capture discriminative features. We decouple the training process into two stages of domain-invariant feature learning and classification training, alleviating the problem of data imbalance. Experiment results on four public blood cell datasets and a private real dataset collected from the Third Affiliated Hospital of Sun Yat-sen University demonstrate that SADA can achieve a new state-of-the-art baseline, which is superior to the existing cutting-edge methods with a big margin. The source code can be available at the URL (\url{https://github.com/AnoK3111/SADA}).
