SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

Orcun Cetintas; Tim Meinhardt; Guillem Brasó; Laura Leal-Taixé

SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

Orcun Cetintas, Tim Meinhardt, Guillem Brasó, Laura Leal-Taixé

TL;DR

The paper tackles the high cost of annotating long, multi-object tracking sequences by introducing SPAM, a video label engine that fuses synthetic pre-training on MOTSynth, pseudo-labeling on real data, and active learning guided by a graph-hierarchy model to label detections and associations with minimal human input. A detection-first, graph-based labeling framework leverages temporal dependencies to propagate decisions across time, while an uncertainty-driven annotator intervention focuses only on the hardest cases. Empirical results show SPAM can reach near-ground-truth tracking performance using as little as 3.3% of manual annotations on MOT17 (and low budgets on MOT20 and DanceTrack) and that retraining trackers with SPAM-generated pseudo-labels yields substantial gains without manual labeling. Overall, SPAM demonstrates that synthetic pretraining, self-training with pseudo-labels, and hierarchical graph reasoning can dramatically reduce labeling costs and scale up tracking datasets for data-hungry trackers, with open-source models and code provided.

Abstract

Increasing the annotation efficiency of trajectory annotations from videos has the potential to enable the next generation of data-hungry tracking algorithms to thrive on large-scale datasets. Despite the importance of this task, there are currently very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a video label engine that provides high-quality labels with minimal human intervention. SPAM is built around two key insights: i) most tracking scenarios can be easily resolved. To take advantage of this, we utilize a pre-trained model to generate high-quality pseudo-labels, reserving human involvement for a smaller subset of more difficult instances; ii) handling the spatiotemporal dependencies of track annotations across time can be elegantly and efficiently formulated through graphs. Therefore, we use a unified graph formulation to address the annotation of both detections and identity association for tracks across time. Based on these insights, SPAM produces high-quality annotations with a fraction of ground truth labeling cost. We demonstrate that trackers trained on SPAM labels achieve comparable performance to those trained on human annotations while requiring only $3-20\%$ of the human labeling effort. Hence, SPAM paves the way towards highly efficient labeling of large-scale tracking datasets. We release all models and code.

SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

TL;DR

Abstract

of the human labeling effort. Hence, SPAM paves the way towards highly efficient labeling of large-scale tracking datasets. We release all models and code.

Paper Structure (22 sections, 1 equation, 8 figures, 5 tables)

This paper contains 22 sections, 1 equation, 8 figures, 5 tables.

Introduction
Related Work
SPAM
Overview
Graph Hierarchies as a Model for Labeling
How to Label on Graph Hierarchies?
Experiments
Datasets and Metrics
Implementation Details
Synthetic Pretaining
Towards a Model for Labeling
Enhancing Annotations with Pseudo-labels and Active Learning
Putting Everything Together: SPAM Labels in Action
Conclusion
Additional Details about SPAM
...and 7 more sections

Figures (8)

Figure 1: Overview of the SPAM model. We first generate a set of detection candidates with our detector. Hierarchical GNNs then classify these candidates into valid and invalid objects via node classification, and assign identities through edge classification.
Figure 2: Overview of the SPAM training and annotation pipeline. (a) Initial model training on synthetic data. (b) Application of SPAM to generate pseudo-labels without incurring manual annotation costs on a real dataset, followed by self-training on pseudo-labels. (c) Real dataset labeling using pseudo-labels and an uncertainty-based active learning approach.
Figure 3: Our graph-based labeling pipeline begins with the selection of nodes for annotation. For each node to be annotated, the annotator could be asked to validate the detection, improve the localization by refining the box or perform association.
Figure 3: Performance boost obtained by our model when retraining with its own pseudo-labels incurring no manual annotation cost.
Figure 4: Analysis of performance gap between training a model on synthetic and real data for the three most common tracking components: detection, association, re-identification.
...and 3 more figures

SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

TL;DR

Abstract

SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

Authors

TL;DR

Abstract

Table of Contents

Figures (8)