IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
Yaming Zhang, Chenqiang Gao, Fangcen Liu, Junjie Guo, Lan Wang, Xinggan Peng, Deyu Meng
TL;DR
This work proposes IV-tuning, to parameter-efficiently harness PVMs for various IR-VIS downstream tasks, including salient object detection, semantic segmentation, and object detection, and exhibits superior generalization and scalability.
Abstract
Existing infrared and visible (IR-VIS) methods inherit the general representations of Pre-trained Visual Models (PVMs) to facilitate complementary learning. However, our analysis indicates that under the full fine-tuning paradigm, the feature space becomes highly constrained and low-ranked, which has been proven to seriously impair generalization. One remedy is to freeze the parameters, which preserves pretrained knowledge and helps maintain feature diversity. To this end, we propose IV-tuning, to parameter-efficiently harness PVMs for various IR-VIS downstream tasks, including salient object detection, semantic segmentation, and object detection. Extensive experiments across various settings demonstrate that IV-tuning outperforms previous state-of-the-art methods, and exhibits superior generalization and scalability. Remarkably, with only a single backbone, IV-tuning effectively facilitates the complementary learning of infrared and visible modalities with merely 3% trainable backbone parameters, and achieves superior computational efficiency compared to conventional IR-VIS paradigms.
