On-the-Fly Point Annotation for Fast Medical Video Labeling

Meyer Adrien; Mazellier Jean-Paul; Jeremy Dana; Nicolas Padoy

On-the-Fly Point Annotation for Fast Medical Video Labeling

Meyer Adrien, Mazellier Jean-Paul, Jeremy Dana, Nicolas Padoy

TL;DR

This work tackles the costly process of bounding-box annotation in medical video data by introducing an on-the-fly point annotation (OTF) approach that preserves continuous labeling during live video viewing. OTF leverages point annotations fed into point-to-box teacher models (Point-DETR, Group R-CNN) to generate pseudo-box labels within a weakly semi-supervised learning framework, enabling efficient self-training of detectors. On the STARHE liver ultrasound dataset, the method achieves a $3.2\times$ speed-up in annotation time and a mean AP@50 improvement of $6.51 \pm 0.98$ over traditional methods at equivalent budgets, with results suggesting that pseudo-labels can rival or surpass fully supervised baselines under certain budgets. Practically, this approach can be implemented on any annotation platform to accelerate integration of deep learning in video-based medical research, reducing expert workload while maintaining or improving detection performance.

Abstract

Purpose: In medical research, deep learning models rely on high-quality annotated data, a process often laborious and timeconsuming. This is particularly true for detection tasks where bounding box annotations are required. The need to adjust two corners makes the process inherently frame-by-frame. Given the scarcity of experts' time, efficient annotation methods suitable for clinicians are needed. Methods: We propose an on-the-fly method for live video annotation to enhance the annotation efficiency. In this approach, a continuous single-point annotation is maintained by keeping the cursor on the object in a live video, mitigating the need for tedious pausing and repetitive navigation inherent in traditional annotation methods. This novel annotation paradigm inherits the point annotation's ability to generate pseudo-labels using a point-to-box teacher model. We empirically evaluate this approach by developing a dataset and comparing on-the-fly annotation time against traditional annotation method. Results: Using our method, annotation speed was 3.2x faster than the traditional annotation technique. We achieved a mean improvement of 6.51 +- 0.98 AP@50 over conventional method at equivalent annotation budgets on the developed dataset. Conclusion: Without bells and whistles, our approach offers a significant speed-up in annotation tasks. It can be easily implemented on any annotation platform to accelerate the integration of deep learning in video-based medical research.

On-the-Fly Point Annotation for Fast Medical Video Labeling

TL;DR

speed-up in annotation time and a mean AP@50 improvement of

over traditional methods at equivalent budgets, with results suggesting that pseudo-labels can rival or surpass fully supervised baselines under certain budgets. Practically, this approach can be implemented on any annotation platform to accelerate integration of deep learning in video-based medical research, reducing expert workload while maintaining or improving detection performance.

Abstract

Paper Structure (13 sections, 2 equations, 5 figures)

This paper contains 13 sections, 2 equations, 5 figures.

Introduction
Related work
Crowdsourcing Annotations
Semi-Supervised/Weakly Supervised Object Detection
Methodology
Live Video Annotation
On-the-fly Point Annotation
Experimental Setup
STARHE Dataset
Teacher models Training
Results
Conclusion
Acknowledgement

Figures (5)

Figure 1: (a) Conventional bounding box annotation approach on static frames. ① ② adjust the two corners, ③ video navigation; (b) Our proposed on-the-fly point annotation method on live video. ① pointing of the targeted structure. Box cyan - lesion; solid lines - ground truth; dashed lines - predictive pseudo labels.
Figure 2: (a) Box plot comparing annotation times between OTF and BBox method. (b) Pairwise comparison of annotation times for each timed video, with a fitted line illustrating the relationship between $T_{BBox}$ and $T_{OTF}$.
Figure 3: Spatial density of OTF annotation locations across all videos with respect to corresponding boxes. The x-axis represents the horizontal position, and the y-axis represents the vertical position of the OTF within the box. Yellow areas indicate regions with a higher density of annotations, while dark blues indicate a lower density.
Figure 4: AP@50 of Student models under similar annotation budgets, utilizing a blend of box-level and pseudo labels. Results, along with the standard deviation, are computed based on three individual runs. $S_{OTF}$ pseudo labels are from point-to-box models using OTF annotation, whereas $S_{BBox}$ uses Faster R-CNN or DETR-derived pseudo labels without prior.
Figure 5: Qualitative results of $S_{OTF}$ are presented. The left column displays the ground truth, featuring both point and corresponding bounding box annotations for lesions. The middle column depicts the predicted pseudo-labels with $S_{OTF}$, and the right column the prediction from the fully supervised model.

On-the-Fly Point Annotation for Fast Medical Video Labeling

TL;DR

Abstract

On-the-Fly Point Annotation for Fast Medical Video Labeling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)