Table of Contents
Fetching ...

Enjoying Information Dividend: Gaze Track-based Medical Weakly Supervised Segmentation

Zhisong Wang, Yiwen Ye, Ziyang Chen, Yong Xia

TL;DR

GradTrack tackles the high cost of pixel-wise annotations in medical image segmentation by leveraging physician gaze data. It introduces Gaze Track Map Generation (GTMG) and Track Attention (TA) to progressively inject gaze-derived priors into a U-Net backbone, using reverse-truncated gaze tracks and distance-based maps to supervise decoder stages. Empirical results on Kvasir-SEG and NCI-ISBI show GradTrack consistently outperforms existing gaze-based WSSS methods and closes the gap to fully supervised models like nnU-Net, with Dice improvements of 3.21% and 2.61% respectively. The work demonstrates that incorporating temporal and sequential gaze information can substantially boost weakly supervised medical image segmentation while reducing annotation burden and enhancing clinical applicability.

Abstract

Weakly supervised semantic segmentation (WSSS) in medical imaging struggles with effectively using sparse annotations. One promising direction for WSSS leverages gaze annotations, captured via eye trackers that record regions of interest during diagnostic procedures. However, existing gaze-based methods, such as GazeMedSeg, do not fully exploit the rich information embedded in gaze data. In this paper, we propose GradTrack, a framework that utilizes physicians' gaze track, including fixation points, durations, and temporal order, to enhance WSSS performance. GradTrack comprises two key components: Gaze Track Map Generation and Track Attention, which collaboratively enable progressive feature refinement through multi-level gaze supervision during the decoding process. Experiments on the Kvasir-SEG and NCI-ISBI datasets demonstrate that GradTrack consistently outperforms existing gaze-based methods, achieving Dice score improvements of 3.21\% and 2.61\%, respectively. Moreover, GradTrack significantly narrows the performance gap with fully supervised models such as nnUNet.

Enjoying Information Dividend: Gaze Track-based Medical Weakly Supervised Segmentation

TL;DR

GradTrack tackles the high cost of pixel-wise annotations in medical image segmentation by leveraging physician gaze data. It introduces Gaze Track Map Generation (GTMG) and Track Attention (TA) to progressively inject gaze-derived priors into a U-Net backbone, using reverse-truncated gaze tracks and distance-based maps to supervise decoder stages. Empirical results on Kvasir-SEG and NCI-ISBI show GradTrack consistently outperforms existing gaze-based WSSS methods and closes the gap to fully supervised models like nnU-Net, with Dice improvements of 3.21% and 2.61% respectively. The work demonstrates that incorporating temporal and sequential gaze information can substantially boost weakly supervised medical image segmentation while reducing annotation burden and enhancing clinical applicability.

Abstract

Weakly supervised semantic segmentation (WSSS) in medical imaging struggles with effectively using sparse annotations. One promising direction for WSSS leverages gaze annotations, captured via eye trackers that record regions of interest during diagnostic procedures. However, existing gaze-based methods, such as GazeMedSeg, do not fully exploit the rich information embedded in gaze data. In this paper, we propose GradTrack, a framework that utilizes physicians' gaze track, including fixation points, durations, and temporal order, to enhance WSSS performance. GradTrack comprises two key components: Gaze Track Map Generation and Track Attention, which collaboratively enable progressive feature refinement through multi-level gaze supervision during the decoding process. Experiments on the Kvasir-SEG and NCI-ISBI datasets demonstrate that GradTrack consistently outperforms existing gaze-based methods, achieving Dice score improvements of 3.21\% and 2.61\%, respectively. Moreover, GradTrack significantly narrows the performance gap with fully supervised models such as nnUNet.

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Three frameworks of gaze-based WSSS. (a) VAM-based: Directly using VAM as supervision. (b) GazeMedSeg: Applying multi thresholds to VAM to generate over- and under-activation maps for joint supervision. (c) GradTrack: Using multiple gaze track attention maps to provide stronger information for overly under-activation maps. "Activ.": Abbreviation of activation.
  • Figure 2: Overview of the proposed GradTrack. (a) The training pipeline of GradTrack, where the red dashed lines denote the supervision information. (b) Track Attention (TA) Module: Composed of three convolutional blocks, it enhances the model’s prior information by truncating the learned GTMG information and fusing it with the main features. (c) Gaze Track Map Generation (GTMG) Module: Generates gaze attention maps by applying different reverse truncation ratios to the gaze track and using a distance-based exponential decay function to enrich the supervision information. "Activ.": Abbreviation of activation.
  • Figure 3: Visualization of segmentation results obtained using gaze-based methods, including VAM, VAM$_{\text{D-CRF}}$, GazeMedSeg, and GradTrack, on the NCI-ISBI and Kvasir-SEG datasets.
  • Figure 4: Ablation study and hyper-parameter $\tau$ discussion on the Kvasir-SEG dataset. (a) Analyzing the contribution of each component within our GradTrack. (b) Performance of our GradTrack with various values of $\tau$.