Table of Contents
Fetching ...

Dynamic Cross-Modal Feature Interaction Network for Hyperspectral and LiDAR Data Classification

Junyan Lin, Feng Gap, Lin Qi, Junyu Dong, Qian Du, Xinbo Gao

TL;DR

This work tackles the challenge of joint hyperspectral and LiDAR data classification by introducing DCMNet, a dynamic routing-based framework that learns data-dependent cross-modal feature fusion. It defines three dedicated feature interaction blocks—BSAB for spatial, BCAB for spectral-channel, and ICB for efficient discrimination—and couples them with a three-layer routing space to adaptively select computation paths per input. Through extensive experiments on Trento, Houston 2013, and Houston 2018, DCMNet demonstrates superior performance over multiple state-of-the-art methods, with ablations confirming the effectiveness of dynamic routing and bilinear attention. The approach advances cross-modal fusion by enabling adaptive, data-aware feature integration, offering practical benefits for robust land-cover classification in diverse sensing environments.

Abstract

Hyperspectral image (HSI) and LiDAR data joint classification is a challenging task. Existing multi-source remote sensing data classification methods often rely on human-designed frameworks for feature extraction, which heavily depend on expert knowledge. To address these limitations, we propose a novel Dynamic Cross-Modal Feature Interaction Network (DCMNet), the first framework leveraging a dynamic routing mechanism for HSI and LiDAR classification. Specifically, our approach introduces three feature interaction blocks: Bilinear Spatial Attention Block (BSAB), Bilinear Channel Attention Block (BCAB), and Integration Convolutional Block (ICB). These blocks are designed to effectively enhance spatial, spectral, and discriminative feature interactions. A multi-layer routing space with routing gates is designed to determine optimal computational paths, enabling data-dependent feature fusion. Additionally, bilinear attention mechanisms are employed to enhance feature interactions in spatial and channel representations. Extensive experiments on three public HSI and LiDAR datasets demonstrate the superiority of DCMNet over state-of-the-art methods. Our code will be available at https://github.com/oucailab/DCMNet.

Dynamic Cross-Modal Feature Interaction Network for Hyperspectral and LiDAR Data Classification

TL;DR

This work tackles the challenge of joint hyperspectral and LiDAR data classification by introducing DCMNet, a dynamic routing-based framework that learns data-dependent cross-modal feature fusion. It defines three dedicated feature interaction blocks—BSAB for spatial, BCAB for spectral-channel, and ICB for efficient discrimination—and couples them with a three-layer routing space to adaptively select computation paths per input. Through extensive experiments on Trento, Houston 2013, and Houston 2018, DCMNet demonstrates superior performance over multiple state-of-the-art methods, with ablations confirming the effectiveness of dynamic routing and bilinear attention. The approach advances cross-modal fusion by enabling adaptive, data-aware feature integration, offering practical benefits for robust land-cover classification in diverse sensing environments.

Abstract

Hyperspectral image (HSI) and LiDAR data joint classification is a challenging task. Existing multi-source remote sensing data classification methods often rely on human-designed frameworks for feature extraction, which heavily depend on expert knowledge. To address these limitations, we propose a novel Dynamic Cross-Modal Feature Interaction Network (DCMNet), the first framework leveraging a dynamic routing mechanism for HSI and LiDAR classification. Specifically, our approach introduces three feature interaction blocks: Bilinear Spatial Attention Block (BSAB), Bilinear Channel Attention Block (BCAB), and Integration Convolutional Block (ICB). These blocks are designed to effectively enhance spatial, spectral, and discriminative feature interactions. A multi-layer routing space with routing gates is designed to determine optimal computational paths, enabling data-dependent feature fusion. Additionally, bilinear attention mechanisms are employed to enhance feature interactions in spatial and channel representations. Extensive experiments on three public HSI and LiDAR datasets demonstrate the superiority of DCMNet over state-of-the-art methods. Our code will be available at https://github.com/oucailab/DCMNet.

Paper Structure

This paper contains 17 sections, 15 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: Comparisons of the traditional method with the proposed DCMNet. (a) Existing multi-source feature fusion method. Their static framework lacks the adaptability to ground objects with semantic diversity. (b) The proposed DCMNet. Dynamic routing mechanism is introduced to learn data-dependent feature extraction paths.
  • Figure 2: The proposed Dynamic Cross-Modal Feature Interaction Network (DCMNet). It is comprised of hyperspectral feature encoder, LiDAR feature encoder, and routing space. In the 3-layer routing space, we design three feature interactive blocks and connect them in a fully connected manner. The details of each block are shown in the rounded rectangle of the corresponding color.
  • Figure 3: Illustration of the bilinear cross-attention. Second-order feature interactions are employed for feature extraction. In BSAB, the attention matrix between query and value is computed along the spatial dimension. In BCAB, the attention matrix between query and value is computed along the channel dimension.
  • Figure 4: The number of principal components in HSI versus the classification accuracy.
  • Figure 5: The size of input image patch versus the classification accuracy.
  • ...and 5 more figures