Optimizing Multi-Modality Trackers via Sensitivity-regularized Tuning
Zhiwen Chen, Jinjian Wu, Zhiyu Zhu, Yifan Zhang, Guangming Shi, Junhui Hou
TL;DR
This work addresses the misfitting challenges that arise when adapting RGB-pretrained trackers to multi-modal tracking tasks. It introduces sensitivity-regularized fine-tuning (SRFT), which leverages two intrinsic parameter sensitivities—prior sensitivity (via a Fisher Information Matrix tangent-space analysis) and transfer sensitivity (via gradient sparsity metrics)—to regulate gradient updates during cross-modal fine-tuning. A dynamic schedule controlled by $\kappa$ balances preserving pre-trained knowledge with adapting to new modalities, resulting in a low-rank, tangent-space-constrained optimization that improves transferability across RGB-Event, RGB-Depth, and RGB-Thermal benchmarks. Extensive experiments show SRFT consistently surpasses state-of-the-art methods on seven benchmarks and demonstrates compatibility with existing transfer-learning paradigms, highlighting its practical impact for robust, cross-modal visual tracking. The approach introduces a principled, data-informed way to navigate the plasticity-stability trade-off in cross-domain transfer, with potential applicability to other multi-modal perception tasks.
Abstract
This paper tackles the critical challenge of optimizing multi-modality trackers by effectively adapting pre-trained models for RGB data. Existing fine-tuning paradigms oscillate between excessive freedom and over-restriction, both leading to a suboptimal plasticity-stability trade-off. To mitigate this dilemma, we propose a novel sensitivity-regularized fine-tuning framework, which delicately refines the learning process by incorporating intrinsic parameter sensitivities. Through a comprehensive investigation of the transition from pre-trained to multi-modal contexts, we identify that parameters sensitive to pivotal foundational patterns and cross-domain shifts are the primary drivers of this issue. Specifically, we first probe the tangent space of pre-trained weights to measure and orient prior sensitivities, dedicated to preserving generalization. Subsequently, we characterize transfer sensitivities during the tuning phase, emphasizing adaptability and stability. By incorporating these sensitivities as unified regularization terms, our method significantly enhances the transferability across modalities. Extensive experiments showcase the superior performance of our method, surpassing current state-of-the-art techniques across various multi-modality tracking benchmarks. The source code and models will be publicly available at https://github.com/zhiwen-xdu/SRTrack.
