Table of Contents
Fetching ...

HOLa: HoloLens Object Labeling

Michael Schwimmbeck, Serouj Khajarian, Konstantin Holzapfel, Johannes Schmidt, Stefanie Remmele

TL;DR

The paper presents HOLa, a Unity/Python application that leverages SAM-Track to enable fully automatic single-object labeling on HoloLens 2 with minimal user input, addressing the data annotation bottleneck in medical AR. By integrating a seed-point prompted SAM-Track within a two-mode workflow (recording and labeling), HOLa performs frame-wise pixel labeling across sequences while maintaining a simple initialization process. Across five experiments spanning phantom and open-liver clinical scenes, HOLa achieves Dice scores between 0.875 and 0.982 and delivers substantial labeling speedups (~500x) over manual annotation, with performance comparable to inter-rater variability. The work demonstrates the feasibility of applying foundation-model-based tracking to AR data, discusses limitations related to object fragmentation and image quality, and provides open-source tooling to facilitate rapid data management in AR research.

Abstract

In the context of medical Augmented Reality (AR) applications, object tracking is a key challenge and requires a significant amount of annotation masks. As segmentation foundation models like the Segment Anything Model (SAM) begin to emerge, zero-shot segmentation requires only minimal human participation obtaining high-quality object masks. We introduce a HoloLens-Object-Labeling (HOLa) Unity and Python application based on the SAM-Track algorithm that offers fully automatic single object annotation for HoloLens 2 while requiring minimal human participation. HOLa does not have to be adjusted to a specific image appearance and could thus alleviate AR research in any application field. We evaluate HOLa for different degrees of image complexity in open liver surgery and in medical phantom experiments. Using HOLa for image annotation can increase the labeling speed by more than 500 times while providing Dice scores between 0.875 and 0.982, which are comparable to human annotators. Our code is publicly available at: https://github.com/mschwimmbeck/HOLa

HOLa: HoloLens Object Labeling

TL;DR

The paper presents HOLa, a Unity/Python application that leverages SAM-Track to enable fully automatic single-object labeling on HoloLens 2 with minimal user input, addressing the data annotation bottleneck in medical AR. By integrating a seed-point prompted SAM-Track within a two-mode workflow (recording and labeling), HOLa performs frame-wise pixel labeling across sequences while maintaining a simple initialization process. Across five experiments spanning phantom and open-liver clinical scenes, HOLa achieves Dice scores between 0.875 and 0.982 and delivers substantial labeling speedups (~500x) over manual annotation, with performance comparable to inter-rater variability. The work demonstrates the feasibility of applying foundation-model-based tracking to AR data, discusses limitations related to object fragmentation and image quality, and provides open-source tooling to facilitate rapid data management in AR research.

Abstract

In the context of medical Augmented Reality (AR) applications, object tracking is a key challenge and requires a significant amount of annotation masks. As segmentation foundation models like the Segment Anything Model (SAM) begin to emerge, zero-shot segmentation requires only minimal human participation obtaining high-quality object masks. We introduce a HoloLens-Object-Labeling (HOLa) Unity and Python application based on the SAM-Track algorithm that offers fully automatic single object annotation for HoloLens 2 while requiring minimal human participation. HOLa does not have to be adjusted to a specific image appearance and could thus alleviate AR research in any application field. We evaluate HOLa for different degrees of image complexity in open liver surgery and in medical phantom experiments. Using HOLa for image annotation can increase the labeling speed by more than 500 times while providing Dice scores between 0.875 and 0.982, which are comparable to human annotators. Our code is publicly available at: https://github.com/mschwimmbeck/HOLa

Paper Structure

This paper contains 5 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The HOLa app consists of two modes. In the recording mode, first, the user points a sphere cursor onto the object of interest by head motion. He selects the cursor position as seed point for segmentation by voice command, which also starts sensor recording. The labeling mode utilizes the SAM 2 seed point prompt to initialize the SAM-Track 3 DeAOT tracker 4 that tracks the segmented object throughout all subsequent frames to obtain pixel-wise labels.
  • Figure 2: The figure shows three example frames (left to right) with their corresponding HOLa annotations for all five experiments.
  • Figure 3: Annotation deviations between HOLa and human annotators for a sample clinical frame. Human annotations strongly vary at organ boundaries, especially in case of shadows and lack of color contrast between structures.
  • Figure 4: Applying HOLa on a liver that is divided into multiple segments leads to distortions in labeling. As a consequence, only one segment of the whole liver is labeled. However, HOLa offers to place additional seed points during quality control.