Table of Contents
Fetching ...

Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification

Zhiqiang Gao, Jiaqi Wang, Hangchi Shen, Zhihao Dou, Xiangbo Zhang, Kaizhu Huang

TL;DR

This paper introduces DWT-CapsNet, a Discrete Wavelet Transform-based Capsule Network for hyperspectral image classification that preserves spectral–spatial information while reducing computational burden. It combines an attentive DWT downsampling layer with a multi-scale, pyramid-inspired routing scheme and a sliding-window partial connection enhanced by self-attention to prune connections without sacrificing accuracy. The approach achieves state-of-the-art performance across four benchmark HSI datasets with lower FLOPs, fewer parameters, and faster inference compared with prior CapsNet and CNN baselines. The work demonstrates practical potential for real-world Earth observation tasks where both accuracy and efficiency are critical.

Abstract

Hyperspectral image (HSI) classification is a crucial technique for remote sensing to build large-scale earth monitoring systems. HSI contains much more information than traditional visual images for identifying the categories of land covers. One recent feasible solution for HSI is to leverage CapsNets for capturing spectral-spatial information. However, these methods require high computational requirements due to the full connection architecture between stacked capsule layers. To solve this problem, a DWT-CapsNet is proposed to identify partial but important connections in CapsNet for a effective and efficient HSI classification. Specifically, we integrate a tailored attention mechanism into a Discrete Wavelet Transform (DWT)-based downsampling layer, alleviating the information loss problem of conventional downsampling operation in feature extractors. Moreover, we propose a novel multi-scale routing algorithm that prunes a large proportion of connections in CapsNet. A capsule pyramid fusion mechanism is designed to aggregate the spectral-spatial relationships in multiple levels of granularity, and then a self-attention mechanism is further conducted in a partially and locally connected architecture to emphasize the meaningful relationships. As shown in the experimental results, our method achieves state-of-the-art accuracy while keeping lower computational demand regarding running time, flops, and the number of parameters, rendering it an appealing choice for practical implementation in HSI classification.

Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification

TL;DR

This paper introduces DWT-CapsNet, a Discrete Wavelet Transform-based Capsule Network for hyperspectral image classification that preserves spectral–spatial information while reducing computational burden. It combines an attentive DWT downsampling layer with a multi-scale, pyramid-inspired routing scheme and a sliding-window partial connection enhanced by self-attention to prune connections without sacrificing accuracy. The approach achieves state-of-the-art performance across four benchmark HSI datasets with lower FLOPs, fewer parameters, and faster inference compared with prior CapsNet and CNN baselines. The work demonstrates practical potential for real-world Earth observation tasks where both accuracy and efficiency are critical.

Abstract

Hyperspectral image (HSI) classification is a crucial technique for remote sensing to build large-scale earth monitoring systems. HSI contains much more information than traditional visual images for identifying the categories of land covers. One recent feasible solution for HSI is to leverage CapsNets for capturing spectral-spatial information. However, these methods require high computational requirements due to the full connection architecture between stacked capsule layers. To solve this problem, a DWT-CapsNet is proposed to identify partial but important connections in CapsNet for a effective and efficient HSI classification. Specifically, we integrate a tailored attention mechanism into a Discrete Wavelet Transform (DWT)-based downsampling layer, alleviating the information loss problem of conventional downsampling operation in feature extractors. Moreover, we propose a novel multi-scale routing algorithm that prunes a large proportion of connections in CapsNet. A capsule pyramid fusion mechanism is designed to aggregate the spectral-spatial relationships in multiple levels of granularity, and then a self-attention mechanism is further conducted in a partially and locally connected architecture to emphasize the meaningful relationships. As shown in the experimental results, our method achieves state-of-the-art accuracy while keeping lower computational demand regarding running time, flops, and the number of parameters, rendering it an appealing choice for practical implementation in HSI classification.
Paper Structure (21 sections, 21 equations, 9 figures, 4 tables)

This paper contains 21 sections, 21 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overall DWT-CapsNet processing: 1. The CNN backbone will extract the feature maps from the HSI patches, and the attentive DWT poolings have been replaced with the traditional downsampling methods, which are embedded into the CNN backbone. 2. The extracted feature map is then input into PrimaryCaps and transferred into capsule structure in layer $U^1$. The capsule network is composed of three capsule layers which are connected by multi-scale routing processes. 3. There is the number of class capsules in ClassCaps, and the maximum length of capsule $||L||$ represents the highest probability of the corresponding class.
  • Figure 2: Illustration for Attentive DWT downsampling method. The DWT transfers the feature map into four channels by using filters $f_{LL},f_{LH},f_{HL}$, and $f_{HH}$. Then, each channel will obtain a weight for generating the downsampled feature maps.
  • Figure 3: The architecture of Pyramid Fusion in Multi-scale Routing method. In the first routing process, the output $U^{0}$ of PrimaryCap is used as the input to the pyramid. On each level of the pyramid, two adjacent prediction tensors $\hat{u}^{1,g-1}_{2p-1}, \hat{u}^{1,g-1}_{2p}$ will be used to generate high-level ones by multiplying the sharing weights $w^{1,g-1}_{1}, w^{1,g-1}_{2}$ of the current layer. Finally, all tensors on the pyramid are concatenated together in $\hat{U}^{1}$ and passed to the next layer.
  • Figure 4: Two connection methods: (a) fully connection: all low-level layer capsules from $\hat{U}^1$ attend the processing of generating for every high-level capsule. (b) Partial Connection: just part of low-level layer capsules $\bar{U}_j^{1}$ attend the generation of $j$-th high-level capsule.
  • Figure 5: Different models implemented on the Kennedy Space Center dataset: (a) false-color image, (b) ground-truth label, (c) classification result with backbone, (d) classification result with backbone + attentive DWT downsampling, (e) classification result with backbone + multi-scale routing, (e) classification result with backbone + Capsule, (f) classification result with DWT-CapsNet.
  • ...and 4 more figures