Table of Contents
Fetching ...

Post-Hoc MOTS: Exploring the Capabilities of Time-Symmetric Multi-Object Tracking

Gergely Szabó, Zsófia Molnár, András Horváth

TL;DR

This work extends time-symmetric tracking (TS) to offline multi-object tracking and segmentation (MOTS) beyond videomicroscopy by evaluating it on synthetic scenarios and pedestrian MOTS data. It contrasts TS with a Kalman filter and restricted TS variants, and introduces a memory-optimized refactor of the TS pipeline that separates data preparation, local tracking, global assignment, and ID reduction. The study uses IoU$_{50}$ and HOTA metrics (including DetA and AssA) to quantify association and detection performance, and it includes an attention analysis of the local tracker to understand morphology- and color-based cues. Findings show TS achieves strong associative tracking, performs comparably to Tracktor on MOTS in terms of HOTA, and substantially outperforms baselines in morphology-aware and visually-cued scenarios, illustrating its broad applicability when inference speed is acceptable.

Abstract

Temporal forward-tracking has been the dominant approach for multi-object segmentation and tracking (MOTS). However, a novel time-symmetric tracking methodology has recently been introduced for the detection, segmentation, and tracking of budding yeast cells in pre-recorded samples. Although this architecture has demonstrated a unique perspective on stable and consistent tracking, as well as missed instance re-interpolation, its evaluation has so far been largely confined to settings related to videomicroscopic environments. In this work, we aim to reveal the broader capabilities, advantages, and potential challenges of this architecture across various specifically designed scenarios, including a pedestrian tracking dataset. We also conduct an ablation study comparing the model against its restricted variants and the widely used Kalman filter. Furthermore, we present an attention analysis of the tracking architecture for both pretrained and non-pretrained models

Post-Hoc MOTS: Exploring the Capabilities of Time-Symmetric Multi-Object Tracking

TL;DR

This work extends time-symmetric tracking (TS) to offline multi-object tracking and segmentation (MOTS) beyond videomicroscopy by evaluating it on synthetic scenarios and pedestrian MOTS data. It contrasts TS with a Kalman filter and restricted TS variants, and introduces a memory-optimized refactor of the TS pipeline that separates data preparation, local tracking, global assignment, and ID reduction. The study uses IoU and HOTA metrics (including DetA and AssA) to quantify association and detection performance, and it includes an attention analysis of the local tracker to understand morphology- and color-based cues. Findings show TS achieves strong associative tracking, performs comparably to Tracktor on MOTS in terms of HOTA, and substantially outperforms baselines in morphology-aware and visually-cued scenarios, illustrating its broad applicability when inference speed is acceptable.

Abstract

Temporal forward-tracking has been the dominant approach for multi-object segmentation and tracking (MOTS). However, a novel time-symmetric tracking methodology has recently been introduced for the detection, segmentation, and tracking of budding yeast cells in pre-recorded samples. Although this architecture has demonstrated a unique perspective on stable and consistent tracking, as well as missed instance re-interpolation, its evaluation has so far been largely confined to settings related to videomicroscopic environments. In this work, we aim to reveal the broader capabilities, advantages, and potential challenges of this architecture across various specifically designed scenarios, including a pedestrian tracking dataset. We also conduct an ablation study comparing the model against its restricted variants and the widely used Kalman filter. Furthermore, we present an attention analysis of the tracking architecture for both pretrained and non-pretrained models

Paper Structure

This paper contains 16 sections, 8 equations, 7 figures.

Figures (7)

  • Figure 1: Data flow diagram of the TS architecture, illustrating the process from raw input image sequence to finalized track predictions.
  • Figure 2: A display of memory usage and runtime differences between the original implementation of the TS architecture tracking segment and our improved implementation.
  • Figure 3: KDE (top) and mean (bottom) metric results of tracker models Kalman, TS, TS-L2 and TS-Shape for datasets Synthetic Arrows and Synthetic Amoeboids.
  • Figure 4: KDE (top) and mean (bottom) metric results of tracker models Kalman, TS and TS-L2, TS-Shape for scenario "Visual signaling" datasets Synthetic Arrows, Synthetic Arrows TR-1 and Synthetic Arrows TR-2.
  • Figure 5: KDE (top) and mean (bottom) metric results of tracker models Kalman, TS, TS-L2 and TS-Shape for scenario "Semi-random positioning" datasets Synthetic Amoeboids, Synthetic Amoeboids RP-1/20 and Synthetic Amoeboids RP-1/5.
  • ...and 2 more figures