Table of Contents
Fetching ...

Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training

Jian Wang, Xin Yang, Xiaohong Jia, Wufeng Xue, Rusi Chen, Yanlin Chen, Xiliang Zhu, Lian Liu, Yan Cao, Jianqiao Zhou, Dong Ni, Ning Gu

TL;DR

A multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods is proposed.

Abstract

Thyroid nodule classification and segmentation in ultrasound images are crucial for computer-aided diagnosis; however, they face limitations owing to insufficient labeled data. In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels. Our method aligns the transverse and longitudinal views of the same nodule, thereby enabling the model to focus more on the nodule area. We designed an adaptive loss function that eliminates the limitations of the paired data. Additionally, we adopted a two-stage pre-training to exploit the pre-training on ImageNet and thyroid ultrasound images. Extensive experiments were conducted on a large-scale dataset collected from multiple centers. The results showed that the proposed method significantly improves nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods. The two-stage pre-training also significantly exceeded ImageNet pre-training.

Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training

TL;DR

A multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods is proposed.

Abstract

Thyroid nodule classification and segmentation in ultrasound images are crucial for computer-aided diagnosis; however, they face limitations owing to insufficient labeled data. In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels. Our method aligns the transverse and longitudinal views of the same nodule, thereby enabling the model to focus more on the nodule area. We designed an adaptive loss function that eliminates the limitations of the paired data. Additionally, we adopted a two-stage pre-training to exploit the pre-training on ImageNet and thyroid ultrasound images. Extensive experiments were conducted on a large-scale dataset collected from multiple centers. The results showed that the proposed method significantly improves nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods. The two-stage pre-training also significantly exceeded ImageNet pre-training.
Paper Structure (24 sections, 6 equations, 8 figures, 11 tables)

This paper contains 24 sections, 6 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: The upper row is the thyroid ultrasound images of different patients, and the lower row is the corresponding nodule masks. Red arrows indicate nodules and green arrows indicate carotid vessels.
  • Figure 2: The color image on the left is a schematic of the thyroid, and thyroid ultrasound images from four patients are on the right. The upper row is the longitudinal views, and the lower row is the corresponding transverse views. The yellow arrows point to the same nodule.
  • Figure 3: Our framework adopts independent query and momentum encoders for each view, and the two views share the same memory bank.
  • Figure 4: Two-stage pre-training. In the first stage, we train the model on ImageNet in a supervised and self-supervised learning manner. In the second stage, we first initialize the model with the learned weights from the first stage and train the model on unlabeled target medical images in a self-supervised manner. Finally, the model is fine-tuned for the target tasks.
  • Figure 5: Networks of three target tasks. For NC, we use ResNet50 as the backbone and a one-layer fully connected layer as the classifier. For NS, we use the UNet as the network. For MNC, both two views have a network that consists of ResNet50 and a one-layer fully connected layer, and the two networks share the same weights.
  • ...and 3 more figures