Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation
Lei Zhu, Yanyu Xu, Huazhu Fu, Xinxing Xu, Rick Siow Mong Goh, Yong Liu
TL;DR
This work tackles label-efficient segmentation when medical images come from unpaired multi-modal sources with disjoint or partially overlapping label sets. It introduces Partially Supervised Unpaired Multi-Modal Learning (PSUMML) and the Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework, which combines a compact segmentation network with modality-specific normalization, two class-conditional domain discriminators, and a decomposed partial class adaptation loss to minimize cross-modality distribution discrepancies. A decomposition theorem guides the design by linking total multi-modal error to empirical margin errors, partial-class distribution discrepancies, and model complexity, while snapshot ensembles provide robust pseudo-labels to further supervise learning on partially labeled pixels. Empirically, DEST achieves significant improvements over state-of-the-art methods on cardiac substructure and abdominal multi-organ segmentation, reaching near full-annotation performance at about 50% labeling cost and demonstrating strong potential for reducing annotation burden in clinical practice.
Abstract
Unpaired Multi-Modal Learning (UMML) which leverages unpaired multi-modal data to boost model performance on each individual modality has attracted a lot of research interests in medical image analysis. However, existing UMML methods require multi-modal datasets to be fully labeled, which incurs tremendous annotation cost. In this paper, we investigate the use of partially labeled data for label-efficient unpaired multi-modal learning, which can reduce the annotation cost by up to one half. We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML) and propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it. Specifically, our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data. The key challenge in PSUMML lies in the complex partial class distribution discrepancy due to partial class annotation, which hinders effective knowledge transfer across modalities. We theoretically analyze this phenomenon with a decomposition theorem and propose a decomposed partial class adaptation technique to precisely align the partially labeled classes across modalities to reduce the distribution discrepancy. We further propose a snapshot ensembled self-training technique to leverage the valuable snapshot models during training to assign pseudo-labels to partially labeled pixels for self-training to boost model performance. We perform extensive experiments under different scenarios of PSUMML for two medical image segmentation tasks, namely cardiac substructure segmentation and abdominal multi-organ segmentation. Our framework outperforms existing methods significantly.
