HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion
Chia-Ming Lee, Yu-Fan Lin, Yu-Hao Ho, Li-Wei Kang, Chih-Chung Hsu
TL;DR
HyFusion addresses the challenge of reconstructing high-resolution hyperspectral images (HR-HSIs) by fusing high-resolution multispectral images (HR-MSIs) with low-resolution HSIs (LR-HSIs) using a Dual-Coupled Network (DCN). Central to the approach is the Enhanced Reception Field Block (ERFB), which combines dense feature reuse with Improved Swin Transformer Layers to expand the receptive field and capture long-range spatial-spectral dependencies. Task-specific losses, including Spectral Angle Mapper (SAM) and Stationary Wavelet Transform (SWT) losses, guide the model toward faithful spectral and spatial-spectral reconstruction, yielding state-of-the-art results on AVIRIS data under varying data availability. The framework demonstrates strong data efficiency and practical viability for resource-constrained hyperspectral imaging scenarios, with extensive experiments showing improved PSNR, SAM, RMSE, and ERGAS while maintaining a compact model size.
Abstract
Hyperspectral image (HSI) fusion addresses the challenge of reconstructing High-Resolution HSIs (HR-HSIs) from High-Resolution Multispectral images (HR-MSIs) and Low-Resolution HSIs (LR-HSIs), a critical task given the high costs and hardware limitations associated with acquiring high-quality HSIs. While existing methods leverage spatial and spectral relationships, they often suffer from limited receptive fields and insufficient feature utilization, leading to suboptimal performance. Furthermore, the scarcity of high-quality HSI data highlights the importance of efficient data utilization to maximize reconstruction quality. To address these issues, we propose HyFusion, a novel Dual-Coupled Network (DCN) framework designed to enhance cross-domain feature extraction and enable effective feature map reusing. The framework first processes HR-MSI and LR-HSI inputs through specialized subnetworks that mutually enhance each other during feature extraction, preserving complementary spatial and spectral details. At its core, HyFusion utilizes an Enhanced Reception Field Block (ERFB), which combines shifting-window attention and dense connections to expand the receptive field, effectively capturing long-range dependencies while minimizing information loss. Extensive experiments demonstrate that HyFusion achieves state-of-the-art performance in HR-MSI/LR-HSI fusion, significantly improving reconstruction quality while maintaining a compact model size and computational efficiency. By integrating enhanced receptive fields and feature map reusing into a coupled network architecture, HyFusion provides a practical and effective solution for HSI fusion in resource-constrained scenarios, setting a new benchmark in hyperspectral imaging. Our code will be publicly available.
