Table of Contents
Fetching ...

Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification

Chun Liu, Longwei Yang, Dongmei Dong, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang

TL;DR

The paper tackles few-shot hyperspectral image classification by addressing boundary-patch misclassification through APNT, which combines a Transformer-based feature extractor with a TransMix-inspired data augmentation strategy. By constructing class prototypes from support samples and generating attention-guided synthetic query samples, APNT enhances the discrimination of spatial-spectral features, particularly near object boundaries. The approach yields state-of-the-art results across multiple hyperspectral datasets, improves boundary-patch accuracy, and demonstrates robustness with or without large-scale pre-training, offering practical benefits for real-world remote sensing tasks. The combination of patch-level attention, online patch mixing, and label-aware synthetic samples provides a concrete, scalable path to reliable few-shot HSI classification.

Abstract

Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corresponding to the pixels which are located at the boundary of the objects in the hyperspectral images, are hard to classify. These boundary patchs are mixed with multi-class spectral information. Inspired by this, we propose to augment the prototype network with TransMix for few-shot hyperspectrial image classification(APNT). While taking the prototype network as the backbone, it adopts the transformer as feature extractor to learn the pixel-to-pixel relation and pay different attentions to different pixels. At the same time, instead of directly using the patches which are cut from the hyperspectral images for training, it randomly mixs up two patches to imitate the boundary patches and uses the synthetic patches to train the model, with the aim to enlarge the number of hard training samples and enhance their diversity. And by following the data agumentation technique TransMix, the attention returned by the transformer is also used to mix up the labels of two patches to generate better labels for synthetic patches. Compared with existing methods, the proposed method has demonstrated sate of the art performance and better robustness for few-shot hyperspectral image classification in our experiments.

Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification

TL;DR

The paper tackles few-shot hyperspectral image classification by addressing boundary-patch misclassification through APNT, which combines a Transformer-based feature extractor with a TransMix-inspired data augmentation strategy. By constructing class prototypes from support samples and generating attention-guided synthetic query samples, APNT enhances the discrimination of spatial-spectral features, particularly near object boundaries. The approach yields state-of-the-art results across multiple hyperspectral datasets, improves boundary-patch accuracy, and demonstrates robustness with or without large-scale pre-training, offering practical benefits for real-world remote sensing tasks. The combination of patch-level attention, online patch mixing, and label-aware synthetic samples provides a concrete, scalable path to reliable few-shot HSI classification.

Abstract

Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corresponding to the pixels which are located at the boundary of the objects in the hyperspectral images, are hard to classify. These boundary patchs are mixed with multi-class spectral information. Inspired by this, we propose to augment the prototype network with TransMix for few-shot hyperspectrial image classification(APNT). While taking the prototype network as the backbone, it adopts the transformer as feature extractor to learn the pixel-to-pixel relation and pay different attentions to different pixels. At the same time, instead of directly using the patches which are cut from the hyperspectral images for training, it randomly mixs up two patches to imitate the boundary patches and uses the synthetic patches to train the model, with the aim to enlarge the number of hard training samples and enhance their diversity. And by following the data agumentation technique TransMix, the attention returned by the transformer is also used to mix up the labels of two patches to generate better labels for synthetic patches. Compared with existing methods, the proposed method has demonstrated sate of the art performance and better robustness for few-shot hyperspectral image classification in our experiments.
Paper Structure (17 sections, 9 equations, 13 figures, 6 tables)

This paper contains 17 sections, 9 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: (a)The classfication map indicating incorrect classification often occurs at boundary pixels; and (b) The overal accuracy of current methods on SA dataset (green color) and on these boundary pixels in SA dataset (red color).
  • Figure 2: The workflow of APNT model. The patches sampled from the datasets are divided into Support set and Query set. Before passing through transformer to extract the spatial-spectral joint features, the synthetic query samples are generated by randomly mixing up two query samples in the query set. Once the features are obtained from transformer, the class prototypes will be derived as the mean of these support features from the same classes. In the meanwhile, the labels of these synthetic query samples are generated by mixing up the labels of two query samples according to the attention returned by transformer. Finally, the cross entropy loss is caculated with the synthetic labels.
  • Figure 3: The transformer model for processing HSI patches, which outputs the spatial-spectral joint feature of each patch and the attentions reflecting the importance of different pixels in the patches.
  • Figure 4: The color map, label map and label color of Chikusei dataset
  • Figure 5: The color map, label map and label color of IP dataset
  • ...and 8 more figures