Dynamic Disentangled Fusion Network for RGBT Tracking

Chenglong Li; Tao Wang; Zhaodong Ding; Yun Xiao; Jin Tang

Dynamic Disentangled Fusion Network for RGBT Tracking

Chenglong Li, Tao Wang, Zhaodong Ding, Yun Xiao, Jin Tang

TL;DR

This work tackles robust RGBT tracking under diverse and dynamic challenges by introducing Dynamic Disentangled Fusion Network (DDFNet), which disentangles multimodal fusion into six attribute-specific dynamic branches plus a general branch. Each branch is formed from router-guided fusion units (SCFU and SFU) to adaptively fuse RGB and TIR features, while an Adaptive Aggregation Fusion Module selects and weights the active branches and a Lightweight Enhancement Fusion Module strengthens the fused representations. A three-stage training procedure, along with LasHeR-based data generation, enables effective training of the dynamic fusion structure and its components. Empirical results on GTOT, RGBT210, RGBT234, and LasHeR show state-of-the-art performance with notable gains across multiple challenge attributes, illustrating improved robustness and generalization in multimodal tracking. The approach offers a practical pathway to reliable RGBT tracking in real-world scenarios by leveraging adaptive, attribute-aware fusion without heavy reliance on large-scale cross-modal data.

Abstract

RGBT tracking usually suffers from various challenging factors of low resolution, similar appearance, extreme illumination, thermal crossover and occlusion, to name a few. Existing works often study complex fusion models to handle challenging scenarios, but can not well adapt to various challenges, which might limit tracking performance. To handle this problem, we propose a novel Dynamic Disentangled Fusion Network called DDFNet, which disentangles the fusion process into several dynamic fusion models via the challenge attributes to adapt to various challenging scenarios, for robust RGBT tracking. In particular, we design six attribute-based fusion models to integrate RGB and thermal features under the six challenging scenarios respectively.Since each fusion model is to deal with the corresponding challenges, such disentangled fusion scheme could increase the fusion capacity without the dependence on large-scale training data. Considering that every challenging scenario also has different levels of difficulty, we propose to optimize the combination of multiple fusion units to form each attribute-based fusion model in a dynamic manner, which could well adapt to the difficulty of the corresponding challenging scenario. To address the issue that which fusion models should be activated in the tracking process, we design an adaptive aggregation fusion module to integrate all features from attribute-based fusion models in an adaptive manner with a three-stage training algorithm. In addition, we design an enhancement fusion module to further strengthen the aggregated feature and modality-specific features. Experimental results on benchmark datasets demonstrate the effectiveness of our DDFNet against other state-of-the-art methods.

Dynamic Disentangled Fusion Network for RGBT Tracking

TL;DR

Abstract

Dynamic Disentangled Fusion Network for RGBT Tracking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)