SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation
Ke Yan, Qing Cai, Fan Zhang, Ziyan Cao, Zhi Liu
TL;DR
The paper addresses the high cost of full volumetric annotations in medical image segmentation by proposing SGTC, which learns from only three orthogonal slices per volume while leveraging semantic guidance from text via CLIP. It combines a semantic-guided auxiliary learning module with a triple-view disparity training strategy to jointly train three plane-specific networks, using cross-modal features and dynamic unsupervised losses to produce high-quality pseudo-labels. Extensive experiments on LA2018, KiTS19, and LiTS show SGTC outperforms most state-of-the-art methods under sparse labeling, with ablations confirming the contributions of SGAL, TVDT, and CLIP-based prompts. The work advances practical semi-supervised segmentation by integrating language semantics with multi-view co-training, potentially reducing radiologist workload in clinical workflows.
Abstract
Although semi-supervised learning has made significant advances in the field of medical image segmentation, fully annotating a volumetric sample slice by slice remains a costly and time-consuming task. Even worse, most of the existing approaches pay much attention to image-level information and ignore semantic features, resulting in the inability to perceive weak boundaries. To address these issues, we propose a novel Semantic-Guided Triplet Co-training (SGTC) framework, which achieves high-end medical image segmentation by only annotating three orthogonal slices of a few volumetric samples, significantly alleviating the burden of radiologists. Our method consist of two main components. Specifically, to enable semantic-aware, fine-granular segmentation and enhance the quality of pseudo-labels, a novel semantic-guided auxiliary learning mechanism is proposed based on the pretrained CLIP. In addition, focusing on a more challenging but clinically realistic scenario, a new triple-view disparity training strategy is proposed, which uses sparse annotations (i.e., only three labeled slices of a few volumes) to perform co-training between three sub-networks, significantly improving the robustness. Extensive experiments on three public medical datasets demonstrate that our method outperforms most state-of-the-art semi-supervised counterparts under sparse annotation settings. The source code is available at https://github.com/xmeimeimei/SGTC.
