TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

Shuxiao Ding; Yutong Yang; Julian Wiederer; Markus Braun; Peizheng Li; Juergen Gall; Bin Yang

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, Bin Yang

TL;DR

This work tackles the limitation of static query denoising in 3D MOT by introducing Temporal Query Denoising (TQD) through a Temporal Denoising Query Generator (TDQ-Gen) that creates denoising queries from previous-frame ground-truths and propagates them to the current frame. The method injects temporal cues and instance-specific features into the MOT training process, employing self-attention and association masks to preserve realistic query interactions, and exploring dedicated, general, and hybrid denoising groups. Across Tracking-by-Attention (TBA), Tracking-by-Detection (TBD), and Alternating Detection and Association (ADA) paradigms, temporal denoising (especially with an explicit association module) yields higher AMOTA/MOTA and lower IDS, with ADA-Track + TQD-Track achieving state-of-the-art results on nuScenes. The approach augments training diversity without changing inference, improving robustness to temporal uncertainty and rare behaviors, and demonstrating practical impact for multi-view 3D MOT systems. The key technical contributions include TDQ-Gen, denoising group strategies (general, dedicated, hybrid), an association mask for learned data association, and extensive ablations validating the benefits of temporal denoising in MOT.

Abstract

Query denoising has become a standard training strategy for DETR-based detectors by addressing the slow convergence issue. Besides that, query denoising can be used to increase the diversity of training samples for modeling complex scenarios which is critical for Multi-Object Tracking (MOT), showing its potential in MOT application. Existing approaches integrate query denoising within the tracking-by-attention paradigm. However, as the denoising process only happens within the single frame, it cannot benefit the tracker to learn temporal-related information. In addition, the attention mask in query denoising prevents information exchange between denoising and object queries, limiting its potential in improving association using self-attention. To address these issues, we propose TQD-Track, which introduces Temporal Query Denoising (TQD) tailored for MOT, enabling denoising queries to carry temporal information and instance-specific feature representation. We introduce diverse noise types onto denoising queries that simulate real-world challenges in MOT. We analyze our proposed TQD for different tracking paradigms, and find out the paradigm with explicit learned data association module, e.g. tracking-by-detection or alternating detection and association, benefit from TQD by a larger margin. For these paradigms, we further design an association mask in the association module to ensure the consistent interaction between track and detection queries as during inference. Extensive experiments on the nuScenes dataset demonstrate that our approach consistently enhances different tracking methods by only changing the training process, especially the paradigms with explicit association module.

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

TL;DR

Abstract

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)