DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification

Minghui Lin; Shu Wang; Xiang Wang; Jianhua Tang; Longbin Fu; Zhengrong Zuo; Nong Sang

DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification

Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang

TL;DR

DMPT tackles the high computational cost of multi-modal ReID by freezing the backbone and optimizing a compact set of decoupled modality-aware prompts. It combines modality-specific and modality-independent semantic prompts with a PromptIBind cross-modal interaction to exchange complementary information without corrupting modality-specific features. The approach yields competitive results on four benchmarks while requiring only a small fraction of tunable parameters, demonstrating strong efficiency and scalability. This work advances parameter-efficient fine-tuning in multi-modal perception by explicitly decoupling modalities and fostering cross-modal semantics through a novel bind-based mechanism.

Abstract

Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e., ViT) have displayed remarkable progress and achieved excellent performance. However, these methods usually adopt the standard full fine-tuning paradigm, which requires the optimization of considerable backbone parameters, causing extensive computational and storage requirements. In this work, we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification, dubbed DMPT, which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically, we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs, such as visible, near-infrared, and thermal-infrared. Built upon the extracted features, we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information, boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.

DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification

TL;DR

Abstract

DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)