Table of Contents
Fetching ...

SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation

Ke Yan, Qing Cai, Fan Zhang, Ziyan Cao, Zhi Liu

TL;DR

The paper addresses the high cost of full volumetric annotations in medical image segmentation by proposing SGTC, which learns from only three orthogonal slices per volume while leveraging semantic guidance from text via CLIP. It combines a semantic-guided auxiliary learning module with a triple-view disparity training strategy to jointly train three plane-specific networks, using cross-modal features and dynamic unsupervised losses to produce high-quality pseudo-labels. Extensive experiments on LA2018, KiTS19, and LiTS show SGTC outperforms most state-of-the-art methods under sparse labeling, with ablations confirming the contributions of SGAL, TVDT, and CLIP-based prompts. The work advances practical semi-supervised segmentation by integrating language semantics with multi-view co-training, potentially reducing radiologist workload in clinical workflows.

Abstract

Although semi-supervised learning has made significant advances in the field of medical image segmentation, fully annotating a volumetric sample slice by slice remains a costly and time-consuming task. Even worse, most of the existing approaches pay much attention to image-level information and ignore semantic features, resulting in the inability to perceive weak boundaries. To address these issues, we propose a novel Semantic-Guided Triplet Co-training (SGTC) framework, which achieves high-end medical image segmentation by only annotating three orthogonal slices of a few volumetric samples, significantly alleviating the burden of radiologists. Our method consist of two main components. Specifically, to enable semantic-aware, fine-granular segmentation and enhance the quality of pseudo-labels, a novel semantic-guided auxiliary learning mechanism is proposed based on the pretrained CLIP. In addition, focusing on a more challenging but clinically realistic scenario, a new triple-view disparity training strategy is proposed, which uses sparse annotations (i.e., only three labeled slices of a few volumes) to perform co-training between three sub-networks, significantly improving the robustness. Extensive experiments on three public medical datasets demonstrate that our method outperforms most state-of-the-art semi-supervised counterparts under sparse annotation settings. The source code is available at https://github.com/xmeimeimei/SGTC.

SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation

TL;DR

The paper addresses the high cost of full volumetric annotations in medical image segmentation by proposing SGTC, which learns from only three orthogonal slices per volume while leveraging semantic guidance from text via CLIP. It combines a semantic-guided auxiliary learning module with a triple-view disparity training strategy to jointly train three plane-specific networks, using cross-modal features and dynamic unsupervised losses to produce high-quality pseudo-labels. Extensive experiments on LA2018, KiTS19, and LiTS show SGTC outperforms most state-of-the-art methods under sparse labeling, with ablations confirming the contributions of SGAL, TVDT, and CLIP-based prompts. The work advances practical semi-supervised segmentation by integrating language semantics with multi-view co-training, potentially reducing radiologist workload in clinical workflows.

Abstract

Although semi-supervised learning has made significant advances in the field of medical image segmentation, fully annotating a volumetric sample slice by slice remains a costly and time-consuming task. Even worse, most of the existing approaches pay much attention to image-level information and ignore semantic features, resulting in the inability to perceive weak boundaries. To address these issues, we propose a novel Semantic-Guided Triplet Co-training (SGTC) framework, which achieves high-end medical image segmentation by only annotating three orthogonal slices of a few volumetric samples, significantly alleviating the burden of radiologists. Our method consist of two main components. Specifically, to enable semantic-aware, fine-granular segmentation and enhance the quality of pseudo-labels, a novel semantic-guided auxiliary learning mechanism is proposed based on the pretrained CLIP. In addition, focusing on a more challenging but clinically realistic scenario, a new triple-view disparity training strategy is proposed, which uses sparse annotations (i.e., only three labeled slices of a few volumes) to perform co-training between three sub-networks, significantly improving the robustness. Extensive experiments on three public medical datasets demonstrate that our method outperforms most state-of-the-art semi-supervised counterparts under sparse annotation settings. The source code is available at https://github.com/xmeimeimei/SGTC.

Paper Structure

This paper contains 12 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: (a) The semantic guidance in visualization comparison. (b) The performance under different sparse annotation strategy. The S, C, and A indicate sagittal, coronal, and axial annotated slices.
  • Figure 2: Architecture of the proposed SGTC framework. For volumes with sparse orthogonal labels, each volume has three corresponding labels. For model $F_s(\cdot)$, the supervision signals are selected from the Coronal and Axial plane, for model $F_c(\cdot)$ from the Sagittal and Axial plane, and for model $F_a(\cdot)$ from the Coronal and Sagittal planes. For volumes without labels, each segmentation result of $F_s(\cdot)$, $F_c(\cdot)$, and $F_a(\cdot)$ act as cross-supervision signals for other sub-networks.
  • Figure 3: Qualitative comparisons on the LA2018 dataset with 10% labeled cases. From left to right: segmentation results of MT mt, UAMT uamt, SASSNet sassnet, CPS cps, DTC dtc, BCP bai2023bidirectional, Desco Desco, our SGTC and ground truth (GT), respectively.
  • Figure 4: Qualitative comparisons on the KITS19 dataset with 10% labeled cases.
  • Figure 5: Qualitative comparisons on the LITS dataset with 10% labeled cases.
  • ...and 2 more figures