Table of Contents
Fetching ...

Self-Supervised Multi-Object Tracking with Path Consistency

Zijia Lu, Bing Shuai, Yanbei Chen, Zhenlin Xu, Davide Modolo

TL;DR

The paper introduces Path Consistency as a self-supervised signal for robust multi-object tracking, enabling learning of long-distance object associations without identity labels. A Path Consistency Loss (PCL) enforces agreement among association distributions computed along multiple observation paths, while regularizers prevent degenerate mappings and enforce forward–backward consistency. The method, evaluated on MOT17, PersonPath22, and KITTI, achieves state-of-the-art performance among unsupervised approaches and approaches supervised methods, with strong ablations confirming the value of long-distance matching and occlusion robustness. The approach relies on frame-skipping pathways, a learned object-embedding space, and a null object to handle disappearances, offering scalable self-supervised MOT that generalizes across challenging scenarios.

Abstract

In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision. Our key idea is that, to track a object through frames, we can obtain multiple different association results from a model by varying the frames it can observe, i.e., skipping frames in observation. As the differences in observations do not alter the identities of objects, the obtained association results should be consistent. Based on this rationale, we generate multiple observation paths, each specifying a different set of frames to be skipped, and formulate the Path Consistency Loss that enforces the association results are consistent across different observation paths. We use the proposed loss to train our object matching model with only self-supervision. By extensive experiments on three tracking datasets (MOT17, PersonPath22, KITTI), we demonstrate that our method outperforms existing unsupervised methods with consistent margins on various evaluation metrics, and even achieves performance close to supervised methods.

Self-Supervised Multi-Object Tracking with Path Consistency

TL;DR

The paper introduces Path Consistency as a self-supervised signal for robust multi-object tracking, enabling learning of long-distance object associations without identity labels. A Path Consistency Loss (PCL) enforces agreement among association distributions computed along multiple observation paths, while regularizers prevent degenerate mappings and enforce forward–backward consistency. The method, evaluated on MOT17, PersonPath22, and KITTI, achieves state-of-the-art performance among unsupervised approaches and approaches supervised methods, with strong ablations confirming the value of long-distance matching and occlusion robustness. The approach relies on frame-skipping pathways, a learned object-embedding space, and a null object to handle disappearances, offering scalable self-supervised MOT that generalizes across challenging scenarios.

Abstract

In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision. Our key idea is that, to track a object through frames, we can obtain multiple different association results from a model by varying the frames it can observe, i.e., skipping frames in observation. As the differences in observations do not alter the identities of objects, the obtained association results should be consistent. Based on this rationale, we generate multiple observation paths, each specifying a different set of frames to be skipped, and formulate the Path Consistency Loss that enforces the association results are consistent across different observation paths. We use the proposed loss to train our object matching model with only self-supervision. By extensive experiments on three tracking datasets (MOT17, PersonPath22, KITTI), we demonstrate that our method outperforms existing unsupervised methods with consistent margins on various evaluation metrics, and even achieves performance close to supervised methods.
Paper Structure (20 sections, 12 equations, 3 figures, 8 tables)

This paper contains 20 sections, 12 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: We propose a novel concept of path consistency for self-supervised MOT. We define a observation path for an object as a temporal list of observed frames from the start to end frame. As such, for the same object, we can generate multiple paths by skipping intermediate frames. As different observations of the same object does not alter its identity, the association results should be consistent across different paths.
  • Figure 2: Overview of Path Consistency Loss (PCL). Our method takes a video clip as input, where objects are localized by an off-the-shelf detector, and uses a selection strategy to choose suitable query objects and their correspondent end frames, then computes PCL to learn association between query objects and objects in end frames. Association probabilities obtained from different paths provide cross-supervision among them and enables self-supervised model learning.
  • Figure 3: Qualitative comparison between our model and UNS UNS. We visualize the tracking on three frames. UNS cannot track the person in green bounding once he is occluded and assigns him a new ID (purple) on frame 80. It also fails to track the person in pink bounding box when she is only partially visible in frame 80. We can consistently track with both people with the same IDs (green, pink).