Table of Contents
Fetching ...

TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking

Thuc Nguyen-Quang, Minh-Triet Tran

TL;DR

This work addresses multi-object tracking (MOT) under challenging reidentification scenarios, introducing a training-free Spatial-aware Sparse Memory (SASM) and an Overlapping-aware Feature Selector (OFS) that selectively store and propagate informative features based on object motion and overlap. Integrated into MOTRv2, the approach stores longer temporal information with limited features, improving association and reidentification (AssA and IDF1) on DanceTrack by 2.0 and 2.1 points, respectively, while maintaining a competitive HOTA score. The key contributions are the SASM module for motion-driven memory sparsification, the OFS module for noise reduction during overlaps, and substantial empirical gains with an emphasis on memory efficiency. This method offers a practical, training-free path to stronger MOT performance, with clear opportunities for end-to-end integration and adaptive memory management in future work.

Abstract

Multi-object tracking (MOT) in computer vision remains a significant challenge, requiring precise localization and continuous tracking of multiple objects in video sequences. The emergence of data sets that emphasize robust reidentification, such as DanceTrack, has highlighted the need for effective solutions. While memory-based approaches have shown promise, they often suffer from high computational complexity and memory usage due to storing feature at every single frame. In this paper, we propose a novel memory-based approach that selectively stores critical features based on object motion and overlapping awareness, aiming to enhance efficiency while minimizing redundancy. As a result, our method not only store longer temporal information with limited number of stored features in the memory, but also diversify states of a particular object to enhance the association performance. Our approach significantly improves over MOTRv2 in the DanceTrack test set, demonstrating a gain of 2.0% AssA score and 2.1% in IDF1 score.

TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking

TL;DR

This work addresses multi-object tracking (MOT) under challenging reidentification scenarios, introducing a training-free Spatial-aware Sparse Memory (SASM) and an Overlapping-aware Feature Selector (OFS) that selectively store and propagate informative features based on object motion and overlap. Integrated into MOTRv2, the approach stores longer temporal information with limited features, improving association and reidentification (AssA and IDF1) on DanceTrack by 2.0 and 2.1 points, respectively, while maintaining a competitive HOTA score. The key contributions are the SASM module for motion-driven memory sparsification, the OFS module for noise reduction during overlaps, and substantial empirical gains with an emphasis on memory efficiency. This method offers a practical, training-free path to stronger MOT performance, with clear opportunities for end-to-end integration and adaptive memory management in future work.

Abstract

Multi-object tracking (MOT) in computer vision remains a significant challenge, requiring precise localization and continuous tracking of multiple objects in video sequences. The emergence of data sets that emphasize robust reidentification, such as DanceTrack, has highlighted the need for effective solutions. While memory-based approaches have shown promise, they often suffer from high computational complexity and memory usage due to storing feature at every single frame. In this paper, we propose a novel memory-based approach that selectively stores critical features based on object motion and overlapping awareness, aiming to enhance efficiency while minimizing redundancy. As a result, our method not only store longer temporal information with limited number of stored features in the memory, but also diversify states of a particular object to enhance the association performance. Our approach significantly improves over MOTRv2 in the DanceTrack test set, demonstrating a gain of 2.0% AssA score and 2.1% in IDF1 score.
Paper Structure (21 sections, 1 equation, 3 figures, 4 tables)

This paper contains 21 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Our proposed method seamlessly integrates into the MOTRv2 motrv2 model as a training-free module, preserving MOTRv2's flexibility for various applications. As depicted in the figure, the pipeline takes proposals generated by YOLOX yolox2021 and previously tracked objects (detailed in Section \ref{['sec:method_revisit']}) as input for MOTRv2. The Spatial-aware Sparse Memory (SASM), further explained in Figure \ref{['fig:SASMmot']}, then processes MOTRv2's outputs.
  • Figure 2: The Spatial-aware Sparse Memory (SASM) takes the output of MOTRv2 as input, which includes both object queries and coordinates. From the object coordinates, the module filters out small-displacement objects to accumulate distance to the next frame, and stores only objects with large displacement, as described in Section \ref{['sec:method_SASM']}. The Overlapping-aware Feature Selector (OFS) plays a crucial role in selecting the best features of each instance since the last memory update, as described in Section \ref{['sec:method_ofs']}.
  • Figure 3: Qualitative results for tracking consistency