Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model
Jan Krejčí, Oliver Kost, Ondřej Straka, Jindřich Duník
TL;DR
This work tackles monocular pedestrian tracking without imposing a ground-plane constraint by introducing a 3D state $oldsymbol{x}^{ ext{3D}}$ that combines position, velocity, and 3D extents (width $omega$ and height $h$). A nonlinear measurement model arises from perspective projection, and an unscented Kalman filter (UKF) is used to fuse monocular detections into 3D estimates via a carefully designed AR process for the bounding-box extents. The contributions include a complete 3D state-space formulation, interpretable process parameters (means and time constants for width/height), and a numerically stable UKF implementation with initialization strategies; evaluation on MOT-17 demonstrates consistent 2D projections and plausible 3D trajectories, with ANEES near one and RMSE improvements over 2D baselines. The approach enables depth-aware pedestrian tracking from a single camera and provides a foundation for estimating the scene ground plane from tracked trajectories in future work.
Abstract
A first-principle single-object model is proposed for pedestrian tracking. It is assumed that the extent of the moving object can be described via known statistics in 3D, such as pedestrian height. The proposed model thus need not constrain the object motion in 3D to a common ground plane, which is usual in 3D visual tracking applications. A nonlinear filter for this model is implemented using the unscented Kalman filter (UKF) and tested using the publicly available MOT-17 dataset. The proposed solution yields promising results in 3D while maintaining perfect results when projected into the 2D image. Moreover, the estimation error covariance matches the true one. Unlike conventional methods, the introduced model parameters have convenient meaning and can readily be adjusted for a problem.
