BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

Hanzheng Wang; Wei Li; Xiang-Gen Xia; Qian Du

BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du

TL;DR

The paper tackles Hyperspectral Camouflaged Object Tracking (HCOT) by introducing BihoT, a large-scale dataset with $41{,}912$ hyperspectral images across $49$ sequences of $25$ bands, designed to stress spectral discrimination over visual cues. It proposes SPDAN, a baseline that fuses spectral information via a Spectral Embedding Network (SEN), a Spectral Prompt-based Backbone Network (SPBN) with cross-modality adapters, and a Distractor-aware Module (DAM) to handle occlusions and background distractors, using a frozen visual transformer backbone for efficiency. Extensive experiments show SPDAN achieves state-of-the-art performance on BihoT and existing HOT datasets, with ablations confirming the effectiveness of SEN and DAM, and cross-dataset tests demonstrating good generalization. The work highlights the practical importance of leveraging spectral information in camouflaged-tracking scenarios and points to future work in incorporating temporal dynamics for further gains.

Abstract

Hyperspectral object tracking (HOT) has exhibited potential in various applications, particularly in scenes where objects are camouflaged. Existing trackers can effectively retrieve objects via band regrouping because of the bias in existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows the tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images without the need to extract spectral features. To tackle this bias, we find that the tracker should focus on the spectral information when object appearance is unreliable. Thus, we provide a new task called hyperspectral camouflaged object tracking (HCOT) and meticulously construct a large-scale HCOT dataset, termed BihoT, which consists of 41,912 hyperspectral images covering 49 video sequences. The dataset covers various artificial camouflage scenes where objects have similar appearances, diverse spectrums, and frequent occlusion, making it a very challenging dataset for HCOT. Besides, a simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed, comprising a spectral embedding network (SEN), a spectral prompt-based backbone network (SPBN), and a distractor-aware module (DAM). Specifically, the SEN extracts spectral-spatial features via 3-D and 2-D convolutions. Then, the SPBN fine-tunes powerful RGB trackers with spectral prompts and alleviates the insufficiency of training samples. Moreover, the DAM utilizes a novel statistic to capture the distractor caused by occlusion from objects and background. Extensive experiments demonstrate that our proposed SPDAN achieves state-of-the-art performance on the proposed BihoT and other HOT datasets.

BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

TL;DR

The paper tackles Hyperspectral Camouflaged Object Tracking (HCOT) by introducing BihoT, a large-scale dataset with

hyperspectral images across

sequences of

bands, designed to stress spectral discrimination over visual cues. It proposes SPDAN, a baseline that fuses spectral information via a Spectral Embedding Network (SEN), a Spectral Prompt-based Backbone Network (SPBN) with cross-modality adapters, and a Distractor-aware Module (DAM) to handle occlusions and background distractors, using a frozen visual transformer backbone for efficiency. Extensive experiments show SPDAN achieves state-of-the-art performance on BihoT and existing HOT datasets, with ablations confirming the effectiveness of SEN and DAM, and cross-dataset tests demonstrating good generalization. The work highlights the practical importance of leveraging spectral information in camouflaged-tracking scenarios and points to future work in incorporating temporal dynamics for further gains.

Abstract

Paper Structure (21 sections, 17 equations, 9 figures, 13 tables)

This paper contains 21 sections, 17 equations, 9 figures, 13 tables.

Introduction
Related work
HS Object Tracker
HOT Datasets
Visual Object Tracker
RGB/RGBT Tracking Datasets and Evaluation Metrics
BihoT Dataset
Image Collection
Data Annotation and Condition Challenges
Data Statistics
Proposed Method
Spectral Prompt-based Backbone Network
Distractor-aware Module
Experiments
Experiment Settings
...and 6 more sections

Figures (9)

Figure 1: Differences between the BihoT dataset and the HOTC-2020 dataset. The object in the green box is a real kiwi, while the object in the red box is a fake kiwi, considered a camouflaged object. Data value refers to the value of a pixel, representing the intensity of the spectral reflectance curve.
Figure 2: Illustration of the proposed BihoT dataset. (a) Examples of spectral distinguishable (s-dis) factors from the BihoT dataset. (b) Examples of false-color images of the Kiwifruit3, Chill2, and Lemon2 video sequences from the BihoT dataset.
Figure 3: Illustration of the overall structure of our proposed SPDAN, including the spectral prompt-based backbone network (SPBN) and distractor-aware module (DAM). Specifically, SPBN contains three main modules, i.e., spectral embedding network (SEN), cross-modality adapter (CA), visual Transformer backbone (VTB), and head network (HN).
Figure 4: Illustration of the structure of CA.
Figure 5: Visualization of the decision confidence (DC) and the corresponding classification map (CM) of the basketball video sequence on the HOTC-2020 dataset. The line graph above represents the change in DC for each frame. It can be observed that when DC is below the threshold (i.e., frame #0055), multiple local extreme points in the CM appear, and the tracking results become unreliable. OD denotes the original images.
...and 4 more figures

BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

TL;DR

Abstract

BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

Authors

TL;DR

Abstract

Table of Contents

Figures (9)