Table of Contents
Fetching ...

HSIGene: A Foundation Model For Hyperspectral Image Generation

Li Pang, Xiangyong Cao, Datao Tang, Shuang Xu, Xueru Bai, Feng Zhou, Deyu Meng

TL;DR

HSIGene is proposed, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation and to enhance the spatial diversity of the training data while preserving spectral fidelity.

Abstract

Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene.

HSIGene: A Foundation Model For Hyperspectral Image Generation

TL;DR

HSIGene is proposed, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation and to enhance the spatial diversity of the training data while preserving spectral fidelity.

Abstract

Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene.
Paper Structure (28 sections, 9 equations, 12 figures, 9 tables)

This paper contains 28 sections, 9 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: A schematic comparison of the existing HSI augmentation methods. UBF yu2024unmixing alleviates the issue of the data sacrifice by performing spectral super-resolution on external RGB images, resulting in data with similar content to RGB images but with less authentic spectral profiles. In contrast, our method performs spatial super-resolution on existing real HSIs, ensuring that both the content and spectral distribution are consistent with real HSIs.
  • Figure 2: Visualization results of our proposed HSIGene in different situations. When provided with more conditions, they can complement each other to achieve more accurate generation. (a) Unconditional generation results. (b) Generation results under single condition and multiple conditions.
  • Figure 3: Overview of the generative model used in our work. The ControlNet encoder incorporates the information of the various conditions and hyperspectral images are generated with the diffusion process.
  • Figure 4: The overall framework of the proposed two-stage super-resolution framework. The RGB bands of HSIs are super-resolved first with a diffusion-based model (DSRNet). Then the high-resolution HSIs are obtained with a guided super-resolution network (RGAN) with the enhanced RGB bands as auxiliary prior information.
  • Figure 5: Overview of the proposed RGAN, which is composed of multiple guided attention layers (GALs). Each GAL is composed of self attention layer (SAL), cross attention layer (CAL), spectral attention layer (SpecAL) and feed forward layer (FFD). The network effectively transfers the fine details from the RGB modality into the hyperspectral modality, ensuring that the super-resolved hyperspectral images retain high fidelity and sharpness.
  • ...and 7 more figures