DINO-Mix: Distilling Foundational Knowledge with Cross-Domain CutMix for Semi-supervised Class-imbalanced Medical Image Segmentation
Xinyu Liu, Guolei Sun
TL;DR
DINO-Mix tackles the failure modes of inward-looking semi-supervised medical segmentation under extreme class imbalance by introducing an outward-looking paradigm. It uses a frozen DINOv3 encoder as an external semantic teacher to provide robust supervision for minority classes (FKD) and couples this with a Progressive Imbalance-aware CutMix (PIC) curriculum that initially emphasizes tail classes and gradually shifts to uniform sampling. The method yields state-of-the-art results on Synapse (20% labels) and AMOS (5% labels), with substantial gains on challenging organs and robust performance across runs. By distilling cross-domain semantic knowledge and evolving data augmentation through training, DINO-Mix mitigates confirmation bias and improves both segmentation accuracy and boundary delineation in imbalanced settings.
Abstract
Semi-supervised learning (SSL) has emerged as a critical paradigm for medical image segmentation, mitigating the immense cost of dense annotations. However, prevailing SSL frameworks are fundamentally "inward-looking", recycling information and biases solely from within the target dataset. This design triggers a vicious cycle of confirmation bias under class imbalance, leading to the catastrophic failure to recognize minority classes. To dismantle this systemic issue, we propose a paradigm shift to a multi-level "outward-looking" framework. Our primary innovation is Foundational Knowledge Distillation (FKD), which looks outward beyond the confines of medical imaging by introducing a pre-trained visual foundation model, DINOv3, as an unbiased external semantic teacher. Instead of trusting the student's biased high confidence, our method distills knowledge from DINOv3's robust understanding of high semantic uniqueness, providing a stable, cross-domain supervisory signal that anchors the learning of minority classes. To complement this core strategy, we further look outward within the data by proposing Progressive Imbalance-aware CutMix (PIC), which creates a dynamic curriculum that adaptively forces the model to focus on minority classes in both labeled and unlabeled subsets. This layered strategy forms our framework, DINO-Mix, which breaks the vicious cycle of bias and achieves remarkable performance on challenging semi-supervised class-imbalanced medical image segmentation benchmarks Synapse and AMOS.
