Table of Contents
Fetching ...

Hyperspectral Adapter for Object Tracking based on Hyperspectral Video

Long Gao, Yunhe Zhang, Langkun Chen, Yan Jiang, Weiying Xie, Yunsong Li

TL;DR

This work tackles hyperspectral object tracking by addressing spectral information loss and model inefficiency when adapting RGB trackers to HS data. It introduces HyA-T, a parameter-efficient framework built from a hyperspectral adapter for self-attention (HAS), a hyperspectral adapter for the MLP (HAM), and a hyperspectral enhancement of input (HEI), enabling effective HS tracking with only a small portion of trainable parameters. Through extensive experiments on HOTC and HOTC2024 across VIS, NIR, and RedNIR bands, HyA-T achieves state-of-the-art results and demonstrates strong generalization and robustness across spectral modalities. The approach offers a practical, spectral-information-preserving path to deploy HS trackers with high accuracy and reduced training cost.

Abstract

Object tracking based on hyperspectral video attracts increasing attention to the rich material and motion information in the hyperspectral videos. The prevailing hyperspectral methods adapt pretrained RGB-based object tracking networks for hyperspectral tasks by fine-tuning the entire network on hyperspectral datasets, which achieves impressive results in challenging scenarios. However, the performance of hyperspectral trackers is limited by the loss of spectral information during the transformation, and fine-tuning the entire pretrained network is inefficient for practical applications. To address the issues, a new hyperspectral object tracking method, hyperspectral adapter for tracking (HyA-T), is proposed in this work. The hyperspectral adapter for the self-attention (HAS) and the hyperspectral adapter for the multilayer perceptron (HAM) are proposed to generate the adaption information and to transfer the multi-head self-attention (MSA) module and the multilayer perceptron (MLP) in pretrained network for the hyperspectral object tracking task by augmenting the adaption information into the calculation of the MSA and MLP. Additionally, the hyperspectral enhancement of input (HEI) is proposed to augment the original spectral information into the input of the tracking network. The proposed methods extract spectral information directly from the hyperspectral images, which prevent the loss of the spectral information. Moreover, only the parameters in the proposed methods are fine-tuned, which is more efficient than the existing methods. Extensive experiments were conducted on four datasets with various spectral bands, verifing the effectiveness of the proposed methods. The HyA-T achieves state-of-the-art performance on all the datasets.

Hyperspectral Adapter for Object Tracking based on Hyperspectral Video

TL;DR

This work tackles hyperspectral object tracking by addressing spectral information loss and model inefficiency when adapting RGB trackers to HS data. It introduces HyA-T, a parameter-efficient framework built from a hyperspectral adapter for self-attention (HAS), a hyperspectral adapter for the MLP (HAM), and a hyperspectral enhancement of input (HEI), enabling effective HS tracking with only a small portion of trainable parameters. Through extensive experiments on HOTC and HOTC2024 across VIS, NIR, and RedNIR bands, HyA-T achieves state-of-the-art results and demonstrates strong generalization and robustness across spectral modalities. The approach offers a practical, spectral-information-preserving path to deploy HS trackers with high accuracy and reduced training cost.

Abstract

Object tracking based on hyperspectral video attracts increasing attention to the rich material and motion information in the hyperspectral videos. The prevailing hyperspectral methods adapt pretrained RGB-based object tracking networks for hyperspectral tasks by fine-tuning the entire network on hyperspectral datasets, which achieves impressive results in challenging scenarios. However, the performance of hyperspectral trackers is limited by the loss of spectral information during the transformation, and fine-tuning the entire pretrained network is inefficient for practical applications. To address the issues, a new hyperspectral object tracking method, hyperspectral adapter for tracking (HyA-T), is proposed in this work. The hyperspectral adapter for the self-attention (HAS) and the hyperspectral adapter for the multilayer perceptron (HAM) are proposed to generate the adaption information and to transfer the multi-head self-attention (MSA) module and the multilayer perceptron (MLP) in pretrained network for the hyperspectral object tracking task by augmenting the adaption information into the calculation of the MSA and MLP. Additionally, the hyperspectral enhancement of input (HEI) is proposed to augment the original spectral information into the input of the tracking network. The proposed methods extract spectral information directly from the hyperspectral images, which prevent the loss of the spectral information. Moreover, only the parameters in the proposed methods are fine-tuned, which is more efficient than the existing methods. Extensive experiments were conducted on four datasets with various spectral bands, verifing the effectiveness of the proposed methods. The HyA-T achieves state-of-the-art performance on all the datasets.

Paper Structure

This paper contains 35 sections, 14 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: (a) The pipeline of HyA-T. The overall structure is composed of HEI, a feature extraction network, and an autoregressive decoder. (b) The internal structure of a transformer block in the encoder, with the incorporation of HAS and HAM.
  • Figure 2: The structure of HAS applied to the query branch. The blue component represents the modules kept frozen during training, while the red components indicate the modules that are updated during the training process.
  • Figure 3: The schematic diagram illustrates the operation of HAM. The blue parts stand for the modules that are frozen during the training, and the red parts are updated during the training. The SHA represents the adapter that implemented sequentially to the MLP. The PHA is the adapter that implemented parallelly to the MLP.
  • Figure 4: The structure of HEI. SA represents the spectral attention module. IF stands for the image-level fusion module.
  • Figure 5: Comparisons of HyA-T and other SOTA HS trackers on the HOTC2024 dataset.
  • ...and 1 more figures