Table of Contents
Fetching ...

An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking

Pratyusha Musunuru, Yuchao Li, Jamison Weber, Dimitri Bertsekas

TL;DR

This work addresses occlusion in multi-object tracking by recasting data association as an approximate dynamic programming problem and introducing ADPTrack, a near-online framework that leverages a few future frames to refine track-object associations. Built as a general wrapper around any online MOT method, ADPTrack generates tentative tracks over a horizon of $\\ell$ frames and forms a softened value-function approximation via a convex combination of base weights and tentative-track similarities, controlled by the tuning parameter $\\alpha$. Empirically, on MOT17 the approach yields a $IDF1$ improvement of $2.1\%$ on the validation set and $0.7\%$ on the test set with concurrent gains in other metrics, particularly in scenarios with fixed cameras; it also incurs extra computation and delay but benefits from parallelizable similarity computations. Overall, ADPTrack provides a principled, flexible framework for improving occlusion-robust data association in MOT, with clear avenues for longer horizons and broader base-tracker integration.

Abstract

In this work, we consider data association problems involving multi-object tracking (MOT). In particular, we address the challenges arising from object occlusions. We propose a framework called approximate dynamic programming track (ADPTrack), which applies dynamic programming principles to improve an existing method called the base heuristic. Given a set of tracks and the next target frame, the base heuristic extends the tracks by matching them to the objects of this target frame directly. In contrast, ADPTrack first processes a few subsequent frames and applies the base heuristic starting from the next target frame to obtain tentative tracks. It then leverages the tentative tracks to match the objects of the target frame. This tends to reduce the occlusion-based errors and leads to an improvement over the base heuristic. When tested on the MOT17 video dataset, the proposed method demonstrates a 0.7% improvement in the association accuracy (IDF1 metric) over a state-of-the-art method that is used as the base heuristic. It also obtains improvements with respect to all the other standard metrics. Empirically, we found that the improvements are particularly pronounced in scenarios where the video data is obtained by fixed-position cameras.

An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking

TL;DR

This work addresses occlusion in multi-object tracking by recasting data association as an approximate dynamic programming problem and introducing ADPTrack, a near-online framework that leverages a few future frames to refine track-object associations. Built as a general wrapper around any online MOT method, ADPTrack generates tentative tracks over a horizon of frames and forms a softened value-function approximation via a convex combination of base weights and tentative-track similarities, controlled by the tuning parameter . Empirically, on MOT17 the approach yields a improvement of on the validation set and on the test set with concurrent gains in other metrics, particularly in scenarios with fixed cameras; it also incurs extra computation and delay but benefits from parallelizable similarity computations. Overall, ADPTrack provides a principled, flexible framework for improving occlusion-robust data association in MOT, with clear avenues for longer horizons and broader base-tracker integration.

Abstract

In this work, we consider data association problems involving multi-object tracking (MOT). In particular, we address the challenges arising from object occlusions. We propose a framework called approximate dynamic programming track (ADPTrack), which applies dynamic programming principles to improve an existing method called the base heuristic. Given a set of tracks and the next target frame, the base heuristic extends the tracks by matching them to the objects of this target frame directly. In contrast, ADPTrack first processes a few subsequent frames and applies the base heuristic starting from the next target frame to obtain tentative tracks. It then leverages the tentative tracks to match the objects of the target frame. This tends to reduce the occlusion-based errors and leads to an improvement over the base heuristic. When tested on the MOT17 video dataset, the proposed method demonstrates a 0.7% improvement in the association accuracy (IDF1 metric) over a state-of-the-art method that is used as the base heuristic. It also obtains improvements with respect to all the other standard metrics. Empirically, we found that the improvements are particularly pronounced in scenarios where the video data is obtained by fixed-position cameras.
Paper Structure (22 sections, 15 equations, 6 figures, 20 tables, 1 algorithm)

This paper contains 22 sections, 15 equations, 6 figures, 20 tables, 1 algorithm.

Figures (6)

  • Figure 1: Frame-by-frame comparison of BoT-SORT (a-d) versus ADPTrack with BoT-SORT as the base heuristic (e-h) in an example. Using BoT-SORT directly, person (4) in (a) is erroneously assigned to another person's identifier (14) in (c)-(d) after occluded by the person (6) in (b). When applying ADPTrack with BoT-SORT as the base heuristic, the same person (4) is assigned with the same identifier in (a) and (c)-(d) despite being occluded by the person (16) [the person (6) in (b)].
  • Figure 2: (a) Overview of key components involved in ADPTrack for MDA (MOT). Red layers and respective arcs represent objects formed in partial groupings (given tracks) up to frame $k$. Blue layer $k+1$ represents the target frame. Green arcs between $k$ and $k+1$ layers represent all pairwise arcs between $\mathcal{N}_k$ and $\mathcal{N}_{k+1}$. Dashed blue matching indicates a selected control $u_k$. Orange layers and arcs represent subsequent frames. (b) Visualization of near-online simulation in ADPTrack. The base heuristic is applied to simulate the solution of the MOT problem starting at frame $k+1$ and ending at frame $k+\ell+1$. The obtained tentative tracks are shown by the orange solid lines. (c) Illustration of computation of similarity scores $c_{k+1}^{ij}(x_k)$. For example, the weight $c_{k+1}^{12}(x_k)$ is assigned to the bold blue dashed arc, which is dependent on the given track (bold red) and the tentative track (bold orange) that the arc connects.
  • Figure 3: Video-wise IDF1($\uparrow$) scores of BoT-SORT and ADPTrack with BoT-SORT as the base heuristic when applied to videos of the MOT17 dataset.
  • Figure 4: Video-wise IDSW($\downarrow$) scores of BoT-SORT and ADPTrack with BoT-SORT as the base heuristic when applied to videos of the MOT17 dataset.
  • Figure 5: Percentage improvement of ADPTrack with BoT-SORT as the base heuristic over BoT-SORT itself across several MOT metrics on the validation dataset.
  • ...and 1 more figures