Table of Contents
Fetching ...

Cosine-Normalized Attention for Hyperspectral Image Classification

Muhammad Ahmad, Manuel Mazzara

Abstract

Transformer-based methods have improved hyperspectral image classification (HSIC) by modeling long-range spatial-spectral dependencies; however, their attention mechanisms typically rely on dot-product similarity, which mixes feature magnitude and orientation and may be suboptimal for hyperspectral data. This work revisits attention scoring from a geometric perspective and introduces a cosine-normalized attention formulation that aligns similarity computation with the angular structure of hyperspectral signatures. By projecting query and key embeddings onto a unit hypersphere and applying a squared cosine similarity, the proposed method emphasizes angular relationships while reducing sensitivity to magnitude variations. The formulation is integrated into a spatial-spectral Transformer and evaluated under extremely limited supervision. Experiments on three benchmark datasets demonstrate that the proposed approach consistently achieves higher performance, outperforming several recent Transformer- and Mamba-based models despite using a lightweight backbone. In addition, a controlled analysis of multiple attention score functions shows that cosine-based scoring provides a reliable inductive bias for hyperspectral representation learning.

Cosine-Normalized Attention for Hyperspectral Image Classification

Abstract

Transformer-based methods have improved hyperspectral image classification (HSIC) by modeling long-range spatial-spectral dependencies; however, their attention mechanisms typically rely on dot-product similarity, which mixes feature magnitude and orientation and may be suboptimal for hyperspectral data. This work revisits attention scoring from a geometric perspective and introduces a cosine-normalized attention formulation that aligns similarity computation with the angular structure of hyperspectral signatures. By projecting query and key embeddings onto a unit hypersphere and applying a squared cosine similarity, the proposed method emphasizes angular relationships while reducing sensitivity to magnitude variations. The formulation is integrated into a spatial-spectral Transformer and evaluated under extremely limited supervision. Experiments on three benchmark datasets demonstrate that the proposed approach consistently achieves higher performance, outperforming several recent Transformer- and Mamba-based models despite using a lightweight backbone. In addition, a controlled analysis of multiple attention score functions shows that cosine-based scoring provides a reliable inductive bias for hyperspectral representation learning.

Paper Structure

This paper contains 9 sections, 6 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Standard dot-product attention mixes vector magnitude and angular alignment through $\mathbf{q}^{\top}\mathbf{k}$. While after $\ell_2$ normalization, the embeddings lie on a unit hypersphere and the score depends only on angular similarity. The proposed squared cosine score further sharpens the distinction between strongly aligned and weakly aligned token pairs.
  • Figure 2: The HSI cube is partitioned into 3D patches, embedded into a token matrix, projected into query, key, and value representations, normalized, and scored using a cosine-based overlap function.
  • Figure 3: Visual comparison of classification maps on SA.
  • Figure 4: Visual comparison of classification maps on TD.
  • Figure 5: Visual comparison of classification maps on HH.
  • ...and 1 more figures