Table of Contents
Fetching ...

Weakly-supervised Medical Image Segmentation with Gaze Annotations

Yuan Zhong, Chenhui Tang, Yumeng Yang, Ruoxi Qi, Kang Zhou, Yuqi Gong, Pheng Ann Heng, Janet H. Hsiao, Qi Dou

TL;DR

This work tackles the high cost of pixel-wise annotations in medical image segmentation by introducing gaze-based dense weak supervision. It presents a multi-level learning framework that trains multiple networks from gaze heatmaps using hierarchical thresholds to mimic discriminative human attention, coupled with a cross-level consistency regularizer to counteract gaze noise. The authors also contribute the GazeMedSeg dataset, extending Kvasir-SEG and NCI-ISBI with gaze data, and demonstrate that gaze supervision substantially narrows the gap to full supervision on polyp and prostate segmentation while reducing annotation time. The approach outperforms existing label-efficient schemes and provides a practical, gaze-enabled pipeline with publicly released data and code.

Abstract

Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical image segmentation with a gaze annotation scheme. To train with gaze, we propose a multi-level framework that trains multiple networks from discriminative human attention, simulated with a set of pseudo-masks derived by applying hierarchical thresholds on gaze heatmaps. Furthermore, to mitigate gaze noise, a cross-level consistency is exploited to regularize overfitting noisy labels, steering models toward clean patterns learned by peer networks. The proposed method is validated on two public medical datasets of polyp and prostate segmentation tasks. We contribute a high-quality gaze dataset entitled GazeMedSeg as an extension to the popular medical segmentation datasets. To the best of our knowledge, this is the first gaze dataset for medical image segmentation. Our experiments demonstrate that gaze annotation outperforms previous label-efficient annotation schemes in terms of both performance and annotation time. Our collected gaze data and code are available at: https://github.com/med-air/GazeMedSeg.

Weakly-supervised Medical Image Segmentation with Gaze Annotations

TL;DR

This work tackles the high cost of pixel-wise annotations in medical image segmentation by introducing gaze-based dense weak supervision. It presents a multi-level learning framework that trains multiple networks from gaze heatmaps using hierarchical thresholds to mimic discriminative human attention, coupled with a cross-level consistency regularizer to counteract gaze noise. The authors also contribute the GazeMedSeg dataset, extending Kvasir-SEG and NCI-ISBI with gaze data, and demonstrate that gaze supervision substantially narrows the gap to full supervision on polyp and prostate segmentation while reducing annotation time. The approach outperforms existing label-efficient schemes and provides a practical, gaze-enabled pipeline with publicly released data and code.

Abstract

Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical image segmentation with a gaze annotation scheme. To train with gaze, we propose a multi-level framework that trains multiple networks from discriminative human attention, simulated with a set of pseudo-masks derived by applying hierarchical thresholds on gaze heatmaps. Furthermore, to mitigate gaze noise, a cross-level consistency is exploited to regularize overfitting noisy labels, steering models toward clean patterns learned by peer networks. The proposed method is validated on two public medical datasets of polyp and prostate segmentation tasks. We contribute a high-quality gaze dataset entitled GazeMedSeg as an extension to the popular medical segmentation datasets. To the best of our knowledge, this is the first gaze dataset for medical image segmentation. Our experiments demonstrate that gaze annotation outperforms previous label-efficient annotation schemes in terms of both performance and annotation time. Our collected gaze data and code are available at: https://github.com/med-air/GazeMedSeg.
Paper Structure (13 sections, 2 equations, 7 figures, 3 tables)

This paper contains 13 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustrations of full and different label-efficient annotation schemes. Dense binarized gaze pseudo-masks are generated with various thresholds $t$, which trade off the activation of the foreground and background.
  • Figure 1: Comparison with full mask supervision and SOTA weakly-supervised methods using different annotation schemes. We report the mean and standard deviation of three runs with different seeds. Dice is used as the evaluation metric. The reported annotation time is estimated to annotate 900 images in Kvasir-SEG jha2020kvasir training set.
  • Figure 2: (a) Overview of the proposed method. For simplicity, we present the case with two levels and $\mathcal{L}_{cons}$ of network 1. The consistency loss is applied to all networks in the implementation. (b) We visualize the dynamics of early-learning (the Dice of output and ground-truth on wrongly annotated pixels) and overfitting (the Dice of output and noisy gaze pseudo-mask on wrongly annotated pixels) with and without the proposed consistency regularization on Kvasir-SEG jha2020kvasir training data. The proposed consistency prevents overfitting on the noisy labels. We use two levels and plot the average of all levels in this experiment. (c) We visualize the gradients of cross-entropy and consistency terms in the training process. The gradients that encourage dilation and erosion of the predicted target are scaled for visualization in different colors. The cross-entropy term gives noisy supervision of erosion on the top of the target object, which is compensated by consistency with clean patterns of dilation learned by other networks.
  • Figure 3: Performance versus annotation time for different annotation schemes. To match annotation times among annotatíion forms, we train a 2D UNet model using from 10% to 100% of the Kvasir-SEG training set.
  • Figure 4: Visualization of gaze data and predictions. The model without consistency term ensemble the noise of different levels. Instead, the model regularized by consistency learns clean patterns of pseudo-masks and demonstrates robustness to noises.
  • ...and 2 more figures