Table of Contents
Fetching ...

RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

Boyue Xu, Yi Xu, Ruichao Hou, Jia Bei, Tongwei Ren, Gangshan Wu

TL;DR

This work addresses the need for real-time, robust RGB-D tracking by introducing HMAD, a Hierarchical Modality Aggregation and Distribution network built on a DIMP baseline. HMAD uses a two-stage architecture with CBAM-based shallow feature extraction and a hierarchical distribution/fusion module that effectively combines RGB texture and depth semantics from multiple feature levels. Ablation and extensive experiments on DepthTrack and RGBD1K demonstrate state-of-the-art accuracy with real-time edge-device performance (around 15 FPS), while real-world tests confirm robustness to occlusion, similar-target interference, and dim lighting. The approach offers practical impact for robotics and HCI by delivering high tracking reliability within the constraints of resource-limited platforms.

Abstract

The integration of dual-modal features has been pivotal in advancing RGB-Depth (RGB-D) tracking. However, current trackers are less efficient and focus solely on single-level features, resulting in weaker robustness in fusion and slower speeds that fail to meet the demands of real-world applications. In this paper, we introduce a novel network, denoted as HMAD (Hierarchical Modality Aggregation and Distribution), which addresses these challenges. HMAD leverages the distinct feature representation strengths of RGB and depth modalities, giving prominence to a hierarchical approach for feature distribution and fusion, thereby enhancing the robustness of RGB-D tracking. Experimental results on various RGB-D datasets demonstrate that HMAD achieves state-of-the-art performance. Moreover, real-world experiments further validate HMAD's capacity to effectively handle a spectrum of tracking challenges in real-time scenarios.

RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

TL;DR

This work addresses the need for real-time, robust RGB-D tracking by introducing HMAD, a Hierarchical Modality Aggregation and Distribution network built on a DIMP baseline. HMAD uses a two-stage architecture with CBAM-based shallow feature extraction and a hierarchical distribution/fusion module that effectively combines RGB texture and depth semantics from multiple feature levels. Ablation and extensive experiments on DepthTrack and RGBD1K demonstrate state-of-the-art accuracy with real-time edge-device performance (around 15 FPS), while real-world tests confirm robustness to occlusion, similar-target interference, and dim lighting. The approach offers practical impact for robotics and HCI by delivering high tracking reliability within the constraints of resource-limited platforms.

Abstract

The integration of dual-modal features has been pivotal in advancing RGB-Depth (RGB-D) tracking. However, current trackers are less efficient and focus solely on single-level features, resulting in weaker robustness in fusion and slower speeds that fail to meet the demands of real-world applications. In this paper, we introduce a novel network, denoted as HMAD (Hierarchical Modality Aggregation and Distribution), which addresses these challenges. HMAD leverages the distinct feature representation strengths of RGB and depth modalities, giving prominence to a hierarchical approach for feature distribution and fusion, thereby enhancing the robustness of RGB-D tracking. Experimental results on various RGB-D datasets demonstrate that HMAD achieves state-of-the-art performance. Moreover, real-world experiments further validate HMAD's capacity to effectively handle a spectrum of tracking challenges in real-time scenarios.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison results with representative trackers on DepthTrack dataset.
  • Figure 2: The framework of HMAD, consists of backbone , hierarchical modality aggregation and distribution network and a target discrimination model.
  • Figure 3: The details of the feature distribution.
  • Figure 4: Qualitative comparison between HDMA and other trackers on three challenging sequences in DepthTrack dateset.
  • Figure 5: Real-world test results of the proposed tracker, the tracking results are marked in red and the ground truth are marked in green.