CAMOT: Camera Angle-aware Multi-Object Tracking
Felix Limanta, Kuniaki Uto, Koichi Shinoda
TL;DR
CAMOT tackles occlusion and depth misestimation in multi-object tracking by estimating a camera elevation angle from object detections under a flat-plane assumption and computing per-object depths to enable pseudo-3D MOT. It jointly optimizes a common plane and the camera angle using an iterative, Nelder–Mead-based process, followed by temporal smoothing, and integrates depth-aware coordinates into a 3D-aware Kalman filter and a camera-angle–aware association metric. When plugged into 2D MOT systems like ByteTrack, CAMOT achieves state-of-the-art HOTA, MOTA, and IDF1 on MOT17 and MOT20 with real-time performance and substantially lower computational cost than monocular depth estimators. The method is lightweight, extensible to other trackers, and offers a practical path toward robust tracking in surveillance scenarios where depth cues are scarce or expensive to compute.
Abstract
This paper proposes CAMOT, a simple camera angle estimator for multi-object tracking to tackle two problems: 1) occlusion and 2) inaccurate distance estimation in the depth direction. Under the assumption that multiple objects are located on a flat plane in each video frame, CAMOT estimates the camera angle using object detection. In addition, it gives the depth of each object, enabling pseudo-3D MOT. We evaluated its performance by adding it to various 2D MOT methods on the MOT17 and MOT20 datasets and confirmed its effectiveness. Applying CAMOT to ByteTrack, we obtained 63.8% HOTA, 80.6% MOTA, and 78.5% IDF1 in MOT17, which are state-of-the-art results. Its computational cost is significantly lower than the existing deep-learning-based depth estimators for tracking.
