Table of Contents
Fetching ...

Gaze-Assisted Medical Image Segmentation

Leila Khaertdinova, Ilya Pershin, Tatiana Shmykova, Bulat Ibragimov

TL;DR

This paper fine-tuned the Segment Anything Model in Medical Images (MedSAM), a public solution that uses various prompt types as additional input for semi-automated segmentation correction, and found the gaze-assisted MedSAM model to be superior to the results of the state-of-the-art segmentation models.

Abstract

The annotation of patient organs is a crucial part of various diagnostic and treatment procedures, such as radiotherapy planning. Manual annotation is extremely time-consuming, while its automation using modern image analysis techniques has not yet reached levels sufficient for clinical adoption. This paper investigates the idea of semi-supervised medical image segmentation using human gaze as interactive input for segmentation correction. In particular, we fine-tuned the Segment Anything Model in Medical Images (MedSAM), a public solution that uses various prompt types as additional input for semi-automated segmentation correction. We used human gaze data from reading abdominal images as a prompt for fine-tuning MedSAM. The model was validated on a public WORD database, which consists of 120 CT scans of 16 abdominal organs. The results of the gaze-assisted MedSAM were shown to be superior to the results of the state-of-the-art segmentation models. In particular, the average Dice coefficient for 16 abdominal organs was 85.8%, 86.7%, 81.7%, and 90.5% for nnUNetV2, ResUNet, original MedSAM, and our gaze-assisted MedSAM model, respectively.

Gaze-Assisted Medical Image Segmentation

TL;DR

This paper fine-tuned the Segment Anything Model in Medical Images (MedSAM), a public solution that uses various prompt types as additional input for semi-automated segmentation correction, and found the gaze-assisted MedSAM model to be superior to the results of the state-of-the-art segmentation models.

Abstract

The annotation of patient organs is a crucial part of various diagnostic and treatment procedures, such as radiotherapy planning. Manual annotation is extremely time-consuming, while its automation using modern image analysis techniques has not yet reached levels sufficient for clinical adoption. This paper investigates the idea of semi-supervised medical image segmentation using human gaze as interactive input for segmentation correction. In particular, we fine-tuned the Segment Anything Model in Medical Images (MedSAM), a public solution that uses various prompt types as additional input for semi-automated segmentation correction. We used human gaze data from reading abdominal images as a prompt for fine-tuning MedSAM. The model was validated on a public WORD database, which consists of 120 CT scans of 16 abdominal organs. The results of the gaze-assisted MedSAM were shown to be superior to the results of the state-of-the-art segmentation models. In particular, the average Dice coefficient for 16 abdominal organs was 85.8%, 86.7%, 81.7%, and 90.5% for nnUNetV2, ResUNet, original MedSAM, and our gaze-assisted MedSAM model, respectively.

Paper Structure

This paper contains 17 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Steps to correct a stomach segmentation mask on a CT slice. Each subfigure shows the outline of reference segmentation contours, the predicted segmentation mask, and gaze points (blue) that are used for prediction.
  • Figure 2: The proposed framework for gaze-assisted interactive segmentation of medical images. An illustrative example demonstrates the segmentation mask for the pancreas organ, which is predicted based on input gaze coordinates serving as a point prompt for the MedSAM model.
  • Figure 3: The provided pipeline outlines the steps for training a gaze-assisted segmentation model using synthetic gaze data points generated based on the mask difference approach. The process begins with the initial prediction from the frozen MedSAM. Using this initial prediction, we generate points that indicate the differences between the predicted mask and the ground truth, referred to as mask correction points. Next, the points are input into the same model to produce the final prediction.
  • Figure 4: Steps to correct segmentation masks for various abdominal organs, such as the spleen, left kidney, and liver, on different CT slices. Each subfigure shows the outline of reference segmentation contours, the predicted segmentation mask, and gaze points (blue) used for predictions based on gaze.