Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu, Bing Shuai, Yanbei Chen, Zhenlin Xu, Davide Modolo
TL;DR
The paper introduces Path Consistency as a self-supervised signal for robust multi-object tracking, enabling learning of long-distance object associations without identity labels. A Path Consistency Loss (PCL) enforces agreement among association distributions computed along multiple observation paths, while regularizers prevent degenerate mappings and enforce forward–backward consistency. The method, evaluated on MOT17, PersonPath22, and KITTI, achieves state-of-the-art performance among unsupervised approaches and approaches supervised methods, with strong ablations confirming the value of long-distance matching and occlusion robustness. The approach relies on frame-skipping pathways, a learned object-embedding space, and a null object to handle disappearances, offering scalable self-supervised MOT that generalizes across challenging scenarios.
Abstract
In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision. Our key idea is that, to track a object through frames, we can obtain multiple different association results from a model by varying the frames it can observe, i.e., skipping frames in observation. As the differences in observations do not alter the identities of objects, the obtained association results should be consistent. Based on this rationale, we generate multiple observation paths, each specifying a different set of frames to be skipped, and formulate the Path Consistency Loss that enforces the association results are consistent across different observation paths. We use the proposed loss to train our object matching model with only self-supervision. By extensive experiments on three tracking datasets (MOT17, PersonPath22, KITTI), we demonstrate that our method outperforms existing unsupervised methods with consistent margins on various evaluation metrics, and even achieves performance close to supervised methods.
