Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing
Juliano Pinto, Georg Hess, Yuxuan Xia, Henk Wymeersch, Lennart Svensson
TL;DR
This work introduces D3AS, a transformer-based framework for multi-object smoothing that decouples data association (DDA) from trajectory smoothing (DS). The DDA predicts a soft association matrix $A\in\mathbb{R}^{n\times B}$ over measurements and tracks, which is partitioned to form per-track inputs for the DS module that outputs trajectory estimates $(\hat{\boldsymbol x}_{1:T}, p_{1:T}, \bar p)$. Training uses two dedicated losses: a Deep Data Associator Loss that aligns predictions to ground-truth associations via a permutation-invariant assignment, and a Deep Smoother Loss that maximizes the likelihood of ground-truth trajectories under a multi-Bernoulli density. Across ten tasks with varying clutter and detection probability, D3AS generally outperforms the model-based TPMBM, particularly in challenging scenarios where data association is hard, while offering better interpretability and faster convergence due to decoupling. The results validate the potential of transformer-based smoothing in low-dimensional measurement regimes and provide the first comparative study against Bayesian trackers in this smoothing context.
Abstract
Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational complexity and require approximations, performing suboptimally in complex settings. Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available and measurements are low-dimensional. We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task. We compare the performance of the proposed smoother to the state-of-the-art in different tasks of varying difficulty and provide, to the best of our knowledge, the first comparison between traditional Bayesian trackers and DL trackers in the smoothing problem setting.
