Table of Contents
Fetching ...

View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV

Deyi Ji, Lanyun Zhu, Siqi Gao, Qi Zhu, Yiru Zhao, Peng Xu, Yue Ding, Hongtao Lu, Jieping Ye, Feng Wu, Feng Zhao

TL;DR

HomView-MOT tackles multi-object tracking in moving UAV scenarios by integrating scene homography with a fast estimation step, view-centric identity learning, and homographic cross-frame matching. The Fast Homography Estimation provides efficient view projection, while View-Centric ID Learning and the Homographic Matching Filter robustly handle view changes and camera motion, yielding state-of-the-art results on VisDrone2019 and UAVDT. The approach demonstrates strong generalization through geometry-based compensation and cross-view ID features, offering practical gains for UAV-based tracking in dynamic environments. Overall, the framework combines geometric view alignment with learned cross-view ID representations to advance MOT under moving-camera conditions.

Abstract

In this paper, we address the challenge of Multi-Object Tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IoU association methods ineffective but also introduce significant view shifts in the objects, which complicates tracking. To overcome these issues, we propose a novel HomView-MOT framework, which for the first time, harnesses the view homography inherent in changing scenes to solve MOT challenges in moving environments, incorporating homographic matching and view-centric concepts. We introduce a Fast Homography Estimation (FHE) algorithm for rapid computation of homography matrices between video frames, enabling object View-Centric ID Learning (VCIL) and leveraging multi-view homography to learn cross-view ID features. Concurrently, our Homographic Matching Filter (HMF) maps object bounding boxes from different frames onto a common view plane for a more realistic physical IoU association. Extensive experiments have proven that these innovations allow HomView-MOT to achieve state-of-the-art performance on prominent UAV MOT datasets VisDrone and UAVDT.

View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV

TL;DR

HomView-MOT tackles multi-object tracking in moving UAV scenarios by integrating scene homography with a fast estimation step, view-centric identity learning, and homographic cross-frame matching. The Fast Homography Estimation provides efficient view projection, while View-Centric ID Learning and the Homographic Matching Filter robustly handle view changes and camera motion, yielding state-of-the-art results on VisDrone2019 and UAVDT. The approach demonstrates strong generalization through geometry-based compensation and cross-view ID features, offering practical gains for UAV-based tracking in dynamic environments. Overall, the framework combines geometric view alignment with learned cross-view ID representations to advance MOT under moving-camera conditions.

Abstract

In this paper, we address the challenge of Multi-Object Tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IoU association methods ineffective but also introduce significant view shifts in the objects, which complicates tracking. To overcome these issues, we propose a novel HomView-MOT framework, which for the first time, harnesses the view homography inherent in changing scenes to solve MOT challenges in moving environments, incorporating homographic matching and view-centric concepts. We introduce a Fast Homography Estimation (FHE) algorithm for rapid computation of homography matrices between video frames, enabling object View-Centric ID Learning (VCIL) and leveraging multi-view homography to learn cross-view ID features. Concurrently, our Homographic Matching Filter (HMF) maps object bounding boxes from different frames onto a common view plane for a more realistic physical IoU association. Extensive experiments have proven that these innovations allow HomView-MOT to achieve state-of-the-art performance on prominent UAV MOT datasets VisDrone and UAVDT.
Paper Structure (28 sections, 15 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 15 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Tracking failures caused by irregular UAV flight states (turning right, hovering) leading to the ineffectiveness of ordinary IoU association. For convenience, we directly map the object box coordinates from the previous frame to the next frame (Box Position Line for clearly illustrating this process). As seen the IoU between the box of the same object across two frames (Directly Mapped Box of the previous frame and Originally Detected Box in the next frame) is very small (may be zero), causing tracker to easily lose tracking. The proposed Homographic Matching Filter utilizes homography spatial relationship between two frames to map the previous frame box homographically onto the next frame before performing IoU association (Homographic Mapped Box of the previous frame, and Originally Detected Box in the next frame), clearly showing a substantial increase and a more reasonable IoU (physical IoU in the real world).
  • Figure 2: The irregular flight trajectory of the UAV causes the captured objects to exhibit a richer variety of view changes relative to a fixed shooting scene, posing greater challenges for MOT algorithms.
  • Figure 3: The overview of our proposed HomView-MOT. It begins with a Fast Homography Estimation (FHE) algorithm that rapidly calculates the homography transformation matrix between any two adjacent frames for view scene projection. Subsequently, the View-Centric ID Learning (VCIL) is utilized to explicitly assimilate the multi-view information and update the ID features. For tracking, HMF is employed to project the object boxes from adjacent frames onto a common view planar space for IoU association. This is coupled with ID association and processed through Matching Algorithm to produce final tracking results.
  • Figure 4: One example of the calculation of homography matrix. The lines indicate the matching of keypoints between the two frames.
  • Figure 5: Illustration of the sampled key frames and non-sampled frames in the video sequence, where $h=5$, for example.
  • ...and 4 more figures