Table of Contents
Fetching ...

Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification

Bernardin Ligan, Khalide Jbilou, Fahd Kalloubi, Ahmed Ratnani

TL;DR

This work tackles HSIC by repurposing a multispectral foundation model (SpectralGPT) and fine-tuning it efficiently for hyperspectral tasks. It systematically compares multiple PEFT methods, introducing KronA+ as a peak performer that combines Kronecker-based updates with a LoRA-inspired learning-rate scheme, achieving competitive accuracy with only ~0.056% of trainable parameters and ~0.2 MB storage. Across five diverse datasets, KronA+ often matches or surpasses dedicated hyperspectral backbones, underscoring the cost-performance benefits of PEFT in remote sensing. The results highlight that large domain-specific foundation models do not always outperform well-tuned, lightweight adapters, offering practical guidance for deploying hyperspectral models in resource-constrained environments.

Abstract

Foundation models have achieved great success across diverse domains, including remote sensing (RS), thanks to their versatility and strong generalization abilities. However, most RS foundation models are designed for multispectral data, while hyperspectral imagery (HSI) - with its hundreds of spectral bands - remains less explored. Fine-tuning such models for downstream tasks is also challenging, often demanding considerable memory and storage. In this paper, we propose an efficient framework to fine-tune SpectralGPT, a multispectral foundation model, for hyperspectral image classification (HSIC). We explore several Parameter-Efficient Fine-Tuning (PEFT) methods, including Low-Rank Adaptation (LoRA), Kronecker-based adaptation (KronA), Low-Rank Kronecker (LoKr), and the recent LoRA+, which uses distinct learning rates for low-rank adapters scaled by a factor lambda. Inspired by LoRA+, we introduce KronA+, which applies a similar mechanism to the Kronecker matrices. We evaluate our approach on five datasets from different sensors, showing competitive performance with state-of-the-art HSI models. Our full fine-tuning (FFT) setup for SpectralGPT even outperforms a dedicated hyperspectral foundation model on some datasets while requiring only a quarter of the training epochs. Under the same number of epochs, KronA+ reaches similar performance with far fewer trainable parameters - just 0.056 percent - and adds only approximately 0.2 megabytes of storage, making it the most effective PEFT method tested.

Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification

TL;DR

This work tackles HSIC by repurposing a multispectral foundation model (SpectralGPT) and fine-tuning it efficiently for hyperspectral tasks. It systematically compares multiple PEFT methods, introducing KronA+ as a peak performer that combines Kronecker-based updates with a LoRA-inspired learning-rate scheme, achieving competitive accuracy with only ~0.056% of trainable parameters and ~0.2 MB storage. Across five diverse datasets, KronA+ often matches or surpasses dedicated hyperspectral backbones, underscoring the cost-performance benefits of PEFT in remote sensing. The results highlight that large domain-specific foundation models do not always outperform well-tuned, lightweight adapters, offering practical guidance for deploying hyperspectral models in resource-constrained environments.

Abstract

Foundation models have achieved great success across diverse domains, including remote sensing (RS), thanks to their versatility and strong generalization abilities. However, most RS foundation models are designed for multispectral data, while hyperspectral imagery (HSI) - with its hundreds of spectral bands - remains less explored. Fine-tuning such models for downstream tasks is also challenging, often demanding considerable memory and storage. In this paper, we propose an efficient framework to fine-tune SpectralGPT, a multispectral foundation model, for hyperspectral image classification (HSIC). We explore several Parameter-Efficient Fine-Tuning (PEFT) methods, including Low-Rank Adaptation (LoRA), Kronecker-based adaptation (KronA), Low-Rank Kronecker (LoKr), and the recent LoRA+, which uses distinct learning rates for low-rank adapters scaled by a factor lambda. Inspired by LoRA+, we introduce KronA+, which applies a similar mechanism to the Kronecker matrices. We evaluate our approach on five datasets from different sensors, showing competitive performance with state-of-the-art HSI models. Our full fine-tuning (FFT) setup for SpectralGPT even outperforms a dedicated hyperspectral foundation model on some datasets while requiring only a quarter of the training epochs. Under the same number of epochs, KronA+ reaches similar performance with far fewer trainable parameters - just 0.056 percent - and adds only approximately 0.2 megabytes of storage, making it the most effective PEFT method tested.

Paper Structure

This paper contains 35 sections, 12 equations, 14 figures, 14 tables.

Figures (14)

  • Figure 1: An overview of the overall pipeline used to fine-tune the SpectralGPT model for hyperspectral image classification. It begins with the HSI Cube, which undergoes dimensionality reduction via PCA to produce a reduced cube of shape $H \times W \times 12$. This cube is divided into overlapping 3D patches, which are then resized and normalized. Each patch is subsequently split into overlapping $8 \times 8 \times 3$ 3D tokens for input into SpectralGPT. SpectralGPT processes these tokens through transformer encoder blocks, followed by an average pooling layer (AvgPool) and an MLP head to produce the classification map. When we are not doing full finetuning, PEFT methods can be applied at the level of each transformer encoder block
  • Figure 2: The Parameter-Efficient Fine-Tuning (PEFT) strategies used in the study: LoRA, KronA, and LoKr are applied to the $Q$ and $V$ components of each transformer layer to adapt the frozen pre-trained weights with minimal additional parameters. The rectangle at the bottom-right highlights the parameter update strategy used in the methods of LoRA+ and KronA+, where instead of using the same learning rate for $A$ and $B$ as in LoRA and KronA, the learning rate of $B$ is set to be $\lambda \times$ that of $A$, with $\lambda \gg 1$ fixed.
  • Figure 3: Visualization of the Indian Pines dataset. (a) False color Image. (b) Ground Truth (GT)
  • Figure 4: Visualization of the Pavia dataset. (a) False color Image. (b) Ground Truth (GT)
  • Figure 5: Visualization of the Houston dataset. (a) False color Image. (b) Ground Truth (GT)
  • ...and 9 more figures