Post-Hoc MOTS: Exploring the Capabilities of Time-Symmetric Multi-Object Tracking
Gergely Szabó, Zsófia Molnár, András Horváth
TL;DR
This work extends time-symmetric tracking (TS) to offline multi-object tracking and segmentation (MOTS) beyond videomicroscopy by evaluating it on synthetic scenarios and pedestrian MOTS data. It contrasts TS with a Kalman filter and restricted TS variants, and introduces a memory-optimized refactor of the TS pipeline that separates data preparation, local tracking, global assignment, and ID reduction. The study uses IoU$_{50}$ and HOTA metrics (including DetA and AssA) to quantify association and detection performance, and it includes an attention analysis of the local tracker to understand morphology- and color-based cues. Findings show TS achieves strong associative tracking, performs comparably to Tracktor on MOTS in terms of HOTA, and substantially outperforms baselines in morphology-aware and visually-cued scenarios, illustrating its broad applicability when inference speed is acceptable.
Abstract
Temporal forward-tracking has been the dominant approach for multi-object segmentation and tracking (MOTS). However, a novel time-symmetric tracking methodology has recently been introduced for the detection, segmentation, and tracking of budding yeast cells in pre-recorded samples. Although this architecture has demonstrated a unique perspective on stable and consistent tracking, as well as missed instance re-interpolation, its evaluation has so far been largely confined to settings related to videomicroscopic environments. In this work, we aim to reveal the broader capabilities, advantages, and potential challenges of this architecture across various specifically designed scenarios, including a pedestrian tracking dataset. We also conduct an ablation study comparing the model against its restricted variants and the widely used Kalman filter. Furthermore, we present an attention analysis of the tracking architecture for both pretrained and non-pretrained models
