Rapid Object Annotation

Misha Denil

Rapid Object Annotation

Misha Denil

TL;DR

This work tackles the bottleneck of annotating bounding boxes for novel objects in video by introducing an interactive annotation tool that leverages an objectness prior and label propagation. The approach combines CenterNet-inspired descriptor maps, continuous tracking, and feature caching to deliver substantial speedups—roughly fivefold—over traditional extreme-click methods while maintaining reasonable label quality. Through experiments on three target objects and multiple annotation styles, the authors demonstrate dramatic reductions in annotation time and provide IoU-based assessments of label accuracy. The findings suggest practical utility for rapid data collection to train detectors for new objects, albeit with attention to UI latency and generalization to diverse scenes.

Abstract

In this report we consider the problem of rapidly annotating a video with bounding boxes for a novel object. We describe a UI and associated workflow designed to make this process fast for an arbitrary novel target.

Rapid Object Annotation

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 9 figures, 1 table)

This paper contains 28 sections, 2 equations, 9 figures, 1 table.

Introduction
Annotation tool
Viewport
Timeline
Autotrack
Sparklines
Smartjump
Extreme clicking
Experiments
Target objects
Videos
Annotation styles
XClick
Click
Boxes
...and 13 more sections

Figures (9)

Figure 1: Example of the annotation UI.
Figure 2: Example of why it is useful to show the bounding boxes. The descriptor location in this example seems reasonable for the target object, but the predicted bounding box shows that the descriptor content is not.
Figure 3: The three target objects we consider in this report. From left to right they are an infra-red thermometer, a pair of pliers, and a clock.
Figure 4: Annotation times under different tools settings, aggregated across videos and objects. The red vertical lines indicate "real time." The black vertical lines show the time per box reported elsewhere for xclick.
Figure 5: Left: Fraction of annotated frames for each method where annotating less than 100% of frames is possible. Right: Total time spent annotating in the different styles. Each annotation style counts time to annotate 6 videos, except for "boxes" counts time to annotate 2 videos. If we had annotated all 6 videos using boxes we would expect it to have taken 2:32:12.
...and 4 more figures

Rapid Object Annotation

TL;DR

Abstract

Rapid Object Annotation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)