An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking
Pratyusha Musunuru, Yuchao Li, Jamison Weber, Dimitri Bertsekas
TL;DR
This work addresses occlusion in multi-object tracking by recasting data association as an approximate dynamic programming problem and introducing ADPTrack, a near-online framework that leverages a few future frames to refine track-object associations. Built as a general wrapper around any online MOT method, ADPTrack generates tentative tracks over a horizon of $\\ell$ frames and forms a softened value-function approximation via a convex combination of base weights and tentative-track similarities, controlled by the tuning parameter $\\alpha$. Empirically, on MOT17 the approach yields a $IDF1$ improvement of $2.1\%$ on the validation set and $0.7\%$ on the test set with concurrent gains in other metrics, particularly in scenarios with fixed cameras; it also incurs extra computation and delay but benefits from parallelizable similarity computations. Overall, ADPTrack provides a principled, flexible framework for improving occlusion-robust data association in MOT, with clear avenues for longer horizons and broader base-tracker integration.
Abstract
In this work, we consider data association problems involving multi-object tracking (MOT). In particular, we address the challenges arising from object occlusions. We propose a framework called approximate dynamic programming track (ADPTrack), which applies dynamic programming principles to improve an existing method called the base heuristic. Given a set of tracks and the next target frame, the base heuristic extends the tracks by matching them to the objects of this target frame directly. In contrast, ADPTrack first processes a few subsequent frames and applies the base heuristic starting from the next target frame to obtain tentative tracks. It then leverages the tentative tracks to match the objects of the target frame. This tends to reduce the occlusion-based errors and leads to an improvement over the base heuristic. When tested on the MOT17 video dataset, the proposed method demonstrates a 0.7% improvement in the association accuracy (IDF1 metric) over a state-of-the-art method that is used as the base heuristic. It also obtains improvements with respect to all the other standard metrics. Empirically, we found that the improvements are particularly pronounced in scenarios where the video data is obtained by fixed-position cameras.
