STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Jianbo Ma; Chuanming Tang; Fei Wu; Can Zhao; Jianlin Zhang; Zhiyong Xu

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu

TL;DR

A novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order.

Abstract

Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above-mentioned issues, we propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state-of-the-art performance in MOTA and IDF1 metrics. The source codes are released at https://github.com/ydhcg-BoBo/STCMOT.

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

TL;DR

Abstract

Paper Structure (13 sections, 6 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 6 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Methodology
Overall Framework
Temporal Embedding Boosting Module
Temporal Detection Refinement Module
Objective Function
Experiments
Implementation Details
Online Inference
Comparison with State of the Arts
Ablation Study
Visualization and Analysis
Conclusion

Figures (3)

Figure 1: T-SEN projection of ReID embeddings for the 15 tracked targets in the UAVDT-M1007 video. STCMOT shows a more discriminative embedding representation compared with the baseline FairMOT, even in the cases where targets #1 and #2 have similar appearances during nighttime.
Figure 2: Architecture details of STCMOT. It consists of three components: a Frame Feature Extractor, a Temporal Embedding Boosting Module, and a Temporal Detection Refinement Module.
Figure 3: Visualization of our STCMOT and baseline FairMOT on the VisDrone2019-uav0305 set and UAVDT-M0205 sets. Each bounding box with a unique ID number represents a tracked target.

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

TL;DR

Abstract

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Authors

TL;DR

Abstract

Table of Contents

Figures (3)