Table of Contents
Fetching ...

MOT FCG++: Enhanced Representation of Spatio-temporal Motion and Appearance Features

Yanzhao Fang

TL;DR

A novel approach for appearance and spatial-temporal motion feature representation is proposed, improving upon the hierarchical clustering association method MOT FCG, and Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on target motion state estimation.

Abstract

The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial-temporal motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial-temporal motion feature representation, improving upon the hierarchical clustering association method MOT FCG. For spatialtemporal motion features, we first propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. Second, Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on target motion state estimation. For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global. Based on the baseline model MOT FCG, we have realized further improvements in the performance of all. we achieved 63.1 HOTA, 76.9 MOTA and 78.2 IDF1 on the MOT17 test set, and also achieved competitive performance on the MOT20 and DanceTrack sets.

MOT FCG++: Enhanced Representation of Spatio-temporal Motion and Appearance Features

TL;DR

A novel approach for appearance and spatial-temporal motion feature representation is proposed, improving upon the hierarchical clustering association method MOT FCG, and Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on target motion state estimation.

Abstract

The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial-temporal motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial-temporal motion feature representation, improving upon the hierarchical clustering association method MOT FCG. For spatialtemporal motion features, we first propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. Second, Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on target motion state estimation. For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global. Based on the baseline model MOT FCG, we have realized further improvements in the performance of all. we achieved 63.1 HOTA, 76.9 MOTA and 78.2 IDF1 on the MOT17 test set, and also achieved competitive performance on the MOT20 and DanceTrack sets.

Paper Structure

This paper contains 14 sections, 4 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison of Different Trackers on the MOT17-test Set and MOT20-test Set in Terms of IDF1, HOTA, and MOTA. The horizontal axis represents HOTA, while the vertical axis represents IDF1. The radius of the circles corresponds to MOTA. Our method, MOT_FCG++, achieves 76.9 MOTA, 78.2 IDF1 and 63.1 HOTA on the MOT17-test set and 68.1 MOTA, 72.3 IDF1 and 58.4 HOTA on the MOT17-test set, demonstrating strong competitiveness. Please refer to the table \ref{['tab:w1']} for further details.
  • Figure 2: Illustration of MOT_FCG++
  • Figure 3: Comparison between IoU and Diagonal Modulated GIoU
  • Figure 4: Limitations of Median Feature Representation
  • Figure 5: Illustrations of MOT17, MOT20, and DanceTrack
  • ...and 2 more figures