Enjoying Information Dividend: Gaze Track-based Medical Weakly Supervised Segmentation
Zhisong Wang, Yiwen Ye, Ziyang Chen, Yong Xia
TL;DR
GradTrack tackles the high cost of pixel-wise annotations in medical image segmentation by leveraging physician gaze data. It introduces Gaze Track Map Generation (GTMG) and Track Attention (TA) to progressively inject gaze-derived priors into a U-Net backbone, using reverse-truncated gaze tracks and distance-based maps to supervise decoder stages. Empirical results on Kvasir-SEG and NCI-ISBI show GradTrack consistently outperforms existing gaze-based WSSS methods and closes the gap to fully supervised models like nnU-Net, with Dice improvements of 3.21% and 2.61% respectively. The work demonstrates that incorporating temporal and sequential gaze information can substantially boost weakly supervised medical image segmentation while reducing annotation burden and enhancing clinical applicability.
Abstract
Weakly supervised semantic segmentation (WSSS) in medical imaging struggles with effectively using sparse annotations. One promising direction for WSSS leverages gaze annotations, captured via eye trackers that record regions of interest during diagnostic procedures. However, existing gaze-based methods, such as GazeMedSeg, do not fully exploit the rich information embedded in gaze data. In this paper, we propose GradTrack, a framework that utilizes physicians' gaze track, including fixation points, durations, and temporal order, to enhance WSSS performance. GradTrack comprises two key components: Gaze Track Map Generation and Track Attention, which collaboratively enable progressive feature refinement through multi-level gaze supervision during the decoding process. Experiments on the Kvasir-SEG and NCI-ISBI datasets demonstrate that GradTrack consistently outperforms existing gaze-based methods, achieving Dice score improvements of 3.21\% and 2.61\%, respectively. Moreover, GradTrack significantly narrows the performance gap with fully supervised models such as nnUNet.
