Table of Contents
Fetching ...

HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion

Chia-Ming Lee, Yu-Fan Lin, Yu-Hao Ho, Li-Wei Kang, Chih-Chung Hsu

TL;DR

HyFusion addresses the challenge of reconstructing high-resolution hyperspectral images (HR-HSIs) by fusing high-resolution multispectral images (HR-MSIs) with low-resolution HSIs (LR-HSIs) using a Dual-Coupled Network (DCN). Central to the approach is the Enhanced Reception Field Block (ERFB), which combines dense feature reuse with Improved Swin Transformer Layers to expand the receptive field and capture long-range spatial-spectral dependencies. Task-specific losses, including Spectral Angle Mapper (SAM) and Stationary Wavelet Transform (SWT) losses, guide the model toward faithful spectral and spatial-spectral reconstruction, yielding state-of-the-art results on AVIRIS data under varying data availability. The framework demonstrates strong data efficiency and practical viability for resource-constrained hyperspectral imaging scenarios, with extensive experiments showing improved PSNR, SAM, RMSE, and ERGAS while maintaining a compact model size.

Abstract

Hyperspectral image (HSI) fusion addresses the challenge of reconstructing High-Resolution HSIs (HR-HSIs) from High-Resolution Multispectral images (HR-MSIs) and Low-Resolution HSIs (LR-HSIs), a critical task given the high costs and hardware limitations associated with acquiring high-quality HSIs. While existing methods leverage spatial and spectral relationships, they often suffer from limited receptive fields and insufficient feature utilization, leading to suboptimal performance. Furthermore, the scarcity of high-quality HSI data highlights the importance of efficient data utilization to maximize reconstruction quality. To address these issues, we propose HyFusion, a novel Dual-Coupled Network (DCN) framework designed to enhance cross-domain feature extraction and enable effective feature map reusing. The framework first processes HR-MSI and LR-HSI inputs through specialized subnetworks that mutually enhance each other during feature extraction, preserving complementary spatial and spectral details. At its core, HyFusion utilizes an Enhanced Reception Field Block (ERFB), which combines shifting-window attention and dense connections to expand the receptive field, effectively capturing long-range dependencies while minimizing information loss. Extensive experiments demonstrate that HyFusion achieves state-of-the-art performance in HR-MSI/LR-HSI fusion, significantly improving reconstruction quality while maintaining a compact model size and computational efficiency. By integrating enhanced receptive fields and feature map reusing into a coupled network architecture, HyFusion provides a practical and effective solution for HSI fusion in resource-constrained scenarios, setting a new benchmark in hyperspectral imaging. Our code will be publicly available.

HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion

TL;DR

HyFusion addresses the challenge of reconstructing high-resolution hyperspectral images (HR-HSIs) by fusing high-resolution multispectral images (HR-MSIs) with low-resolution HSIs (LR-HSIs) using a Dual-Coupled Network (DCN). Central to the approach is the Enhanced Reception Field Block (ERFB), which combines dense feature reuse with Improved Swin Transformer Layers to expand the receptive field and capture long-range spatial-spectral dependencies. Task-specific losses, including Spectral Angle Mapper (SAM) and Stationary Wavelet Transform (SWT) losses, guide the model toward faithful spectral and spatial-spectral reconstruction, yielding state-of-the-art results on AVIRIS data under varying data availability. The framework demonstrates strong data efficiency and practical viability for resource-constrained hyperspectral imaging scenarios, with extensive experiments showing improved PSNR, SAM, RMSE, and ERGAS while maintaining a compact model size.

Abstract

Hyperspectral image (HSI) fusion addresses the challenge of reconstructing High-Resolution HSIs (HR-HSIs) from High-Resolution Multispectral images (HR-MSIs) and Low-Resolution HSIs (LR-HSIs), a critical task given the high costs and hardware limitations associated with acquiring high-quality HSIs. While existing methods leverage spatial and spectral relationships, they often suffer from limited receptive fields and insufficient feature utilization, leading to suboptimal performance. Furthermore, the scarcity of high-quality HSI data highlights the importance of efficient data utilization to maximize reconstruction quality. To address these issues, we propose HyFusion, a novel Dual-Coupled Network (DCN) framework designed to enhance cross-domain feature extraction and enable effective feature map reusing. The framework first processes HR-MSI and LR-HSI inputs through specialized subnetworks that mutually enhance each other during feature extraction, preserving complementary spatial and spectral details. At its core, HyFusion utilizes an Enhanced Reception Field Block (ERFB), which combines shifting-window attention and dense connections to expand the receptive field, effectively capturing long-range dependencies while minimizing information loss. Extensive experiments demonstrate that HyFusion achieves state-of-the-art performance in HR-MSI/LR-HSI fusion, significantly improving reconstruction quality while maintaining a compact model size and computational efficiency. By integrating enhanced receptive fields and feature map reusing into a coupled network architecture, HyFusion provides a practical and effective solution for HSI fusion in resource-constrained scenarios, setting a new benchmark in hyperspectral imaging. Our code will be publicly available.
Paper Structure (13 sections, 11 equations, 3 figures, 1 table)

This paper contains 13 sections, 11 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Visualization of Effective Reception Field. (a) naive Swin Transformer Layer (STL), (RSTB in deep feature extraction SwinIR); (b) with dense connection (using RDG Hsu_2024_CVPR); (c) with dense connection and improved STL (Ours).
  • Figure 2: The proposed HyFusion network architecture integrates an Enhanced Reception Field Block (ERFB) with an Improved Swin Transformer Layer (ISTL) for HSI/MSI fusion. Through our Dual-Coupled Network (DCN), the architecture learns spatial-spectral representations across two specialized branches. The combination of ERFB, ISTL and DCN enables effective capture of long-range dependencies while maintaining streamlined design, leading to superior fusion results through efficient information exchange between spatial and spectral domains, thereby improving data efficiency for HSI real-world applications.
  • Figure 3: Performance comparison between several HSI fusion models. (a) Evaluation of model performance with varying training data sizes. (b) Validation performance curves during training with 5% training samples.