CAMOT: Camera Angle-aware Multi-Object Tracking

Felix Limanta; Kuniaki Uto; Koichi Shinoda

CAMOT: Camera Angle-aware Multi-Object Tracking

Felix Limanta, Kuniaki Uto, Koichi Shinoda

TL;DR

CAMOT tackles occlusion and depth misestimation in multi-object tracking by estimating a camera elevation angle from object detections under a flat-plane assumption and computing per-object depths to enable pseudo-3D MOT. It jointly optimizes a common plane and the camera angle using an iterative, Nelder–Mead-based process, followed by temporal smoothing, and integrates depth-aware coordinates into a 3D-aware Kalman filter and a camera-angle–aware association metric. When plugged into 2D MOT systems like ByteTrack, CAMOT achieves state-of-the-art HOTA, MOTA, and IDF1 on MOT17 and MOT20 with real-time performance and substantially lower computational cost than monocular depth estimators. The method is lightweight, extensible to other trackers, and offers a practical path toward robust tracking in surveillance scenarios where depth cues are scarce or expensive to compute.

Abstract

This paper proposes CAMOT, a simple camera angle estimator for multi-object tracking to tackle two problems: 1) occlusion and 2) inaccurate distance estimation in the depth direction. Under the assumption that multiple objects are located on a flat plane in each video frame, CAMOT estimates the camera angle using object detection. In addition, it gives the depth of each object, enabling pseudo-3D MOT. We evaluated its performance by adding it to various 2D MOT methods on the MOT17 and MOT20 datasets and confirmed its effectiveness. Applying CAMOT to ByteTrack, we obtained 63.8% HOTA, 80.6% MOTA, and 78.5% IDF1 in MOT17, which are state-of-the-art results. Its computational cost is significantly lower than the existing deep-learning-based depth estimators for tracking.

CAMOT: Camera Angle-aware Multi-Object Tracking

TL;DR

Abstract

Paper Structure (25 sections, 11 equations, 3 figures, 7 tables)

This paper contains 25 sections, 11 equations, 3 figures, 7 tables.

Introduction
Related Works
2D Multi-Object Tracking (MOT)
Occlusion
Depth Estimation
Depth Estimation for MOT
Camera Pose Estimation Methods
Camera Angle Estimation
Outline
Assumptions and Problem Formulation
Bounding Box Selection
Initial Elevation Angle Setting
Depth Estimation Using Detection Results
Plane Estimation
Error Calculation
...and 10 more sections

Figures (3)

Figure 1: Illustration on the idea of CAMOT. Under the assumption that multiple objects are located on a flat plane, the camera angle is estimated using object detection. The scale of each bounding box indicates the depth of each object, whereas the distribution of the bounding boxes informs us of the camera angle.
Figure 2: 2D planar side view of the system. Black parts show the part of the system shared by all objects, whereas blue and red parts show different objects.
Figure 3: 2D planar side view for one object. Black parts show part of the system shared by all objects, while blue parts show components unique for the object $i$. Green parts show derived points, angles, etc., for calculation.

CAMOT: Camera Angle-aware Multi-Object Tracking

TL;DR

Abstract

CAMOT: Camera Angle-aware Multi-Object Tracking

Authors

TL;DR

Abstract

Table of Contents

Figures (3)