Table of Contents
Fetching ...

Fourier-enhanced Implicit Neural Fusion Network for Multispectral and Hyperspectral Image Fusion

Yu-Jie Liang, Zihan Cao, Liang-Jian Deng, Xiao Wu

TL;DR

This work tackles multispectral and hyperspectral image fusion (MHIF) by addressing the high-frequency information loss and limited global context of conventional implicit neural representations. It introduces FeINFN, a dual-domain framework that transforms latent codes into the Fourier domain and fuses them via a Spatial-Frequency Implicit Fusion Function (Spa-Fre IFF), complemented by a Spatial-Frequency Interactive Decoder (SFID) that employs a complex Gabor wavelet activation to promote robust cross-domain interaction. The method provides a theoretical basis for the Gabor activation's time-frequency tightness and demonstrates state-of-the-art performance on the CAVE and Harvard MHIF benchmarks, supported by comprehensive ablations of the spatial and Fourier components. The work suggests a generalizable approach for frequency-aware, implicit fusion in high-resolution image synthesis tasks, with code to be released on GitHub.

Abstract

Recently, implicit neural representations (INR) have made significant strides in various vision-related domains, providing a novel solution for Multispectral and Hyperspectral Image Fusion (MHIF) tasks. However, INR is prone to losing high-frequency information and is confined to the lack of global perceptual capabilities. To address these issues, this paper introduces a Fourier-enhanced Implicit Neural Fusion Network (FeINFN) specifically designed for MHIF task, targeting the following phenomena: The Fourier amplitudes of the HR-HSI latent code and LR-HSI are remarkably similar; however, their phases exhibit different patterns. In FeINFN, we innovatively propose a spatial and frequency implicit fusion function (Spa-Fre IFF), helping INR capture high-frequency information and expanding the receptive field. Besides, a new decoder employing a complex Gabor wavelet activation function, called Spatial-Frequency Interactive Decoder (SFID), is invented to enhance the interaction of INR features. Especially, we further theoretically prove that the Gabor wavelet activation possesses a time-frequency tightness property that favors learning the optimal bandwidths in the decoder. Experiments on two benchmark MHIF datasets verify the state-of-the-art (SOTA) performance of the proposed method, both visually and quantitatively. Also, ablation studies demonstrate the mentioned contributions. The code will be available on Anonymous GitHub (https://anonymous.4open.science/r/FeINFN-15C9/) after possible acceptance.

Fourier-enhanced Implicit Neural Fusion Network for Multispectral and Hyperspectral Image Fusion

TL;DR

This work tackles multispectral and hyperspectral image fusion (MHIF) by addressing the high-frequency information loss and limited global context of conventional implicit neural representations. It introduces FeINFN, a dual-domain framework that transforms latent codes into the Fourier domain and fuses them via a Spatial-Frequency Implicit Fusion Function (Spa-Fre IFF), complemented by a Spatial-Frequency Interactive Decoder (SFID) that employs a complex Gabor wavelet activation to promote robust cross-domain interaction. The method provides a theoretical basis for the Gabor activation's time-frequency tightness and demonstrates state-of-the-art performance on the CAVE and Harvard MHIF benchmarks, supported by comprehensive ablations of the spatial and Fourier components. The work suggests a generalizable approach for frequency-aware, implicit fusion in high-resolution image synthesis tasks, with code to be released on GitHub.

Abstract

Recently, implicit neural representations (INR) have made significant strides in various vision-related domains, providing a novel solution for Multispectral and Hyperspectral Image Fusion (MHIF) tasks. However, INR is prone to losing high-frequency information and is confined to the lack of global perceptual capabilities. To address these issues, this paper introduces a Fourier-enhanced Implicit Neural Fusion Network (FeINFN) specifically designed for MHIF task, targeting the following phenomena: The Fourier amplitudes of the HR-HSI latent code and LR-HSI are remarkably similar; however, their phases exhibit different patterns. In FeINFN, we innovatively propose a spatial and frequency implicit fusion function (Spa-Fre IFF), helping INR capture high-frequency information and expanding the receptive field. Besides, a new decoder employing a complex Gabor wavelet activation function, called Spatial-Frequency Interactive Decoder (SFID), is invented to enhance the interaction of INR features. Especially, we further theoretically prove that the Gabor wavelet activation possesses a time-frequency tightness property that favors learning the optimal bandwidths in the decoder. Experiments on two benchmark MHIF datasets verify the state-of-the-art (SOTA) performance of the proposed method, both visually and quantitatively. Also, ablation studies demonstrate the mentioned contributions. The code will be available on Anonymous GitHub (https://anonymous.4open.science/r/FeINFN-15C9/) after possible acceptance.
Paper Structure (14 sections, 13 equations, 7 figures, 4 tables)

This paper contains 14 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of our method with other methods on the CAVE($\times$ 8) and Harvard($\times$ 8) datasets. Closer to the top-right corner indicates better performance and the size of the circle indicates the number of parameters in the model.
  • Figure 2: The amplitude of latent code from the encoder fed by HR-HSI and LR-HSI (combined with HR-MSI) share a similarity, but the phases differ from each other. $E_{\psi^*}$ is a trained encoder.
  • Figure 3: The flowchart of the FeINFN framework which is composed of a spectral encoder $E_{\chi}$, a spatial encoder $E_{\psi}$, MHIF task-designed spatial and Fourier domains implicit fusion functions, and a pixel space mapping decoder. Please note that $\mathbf I^{LR}$ is the LR-HSI, $\mathbf I^{HR}$ is the HR-MSI, $\mathbf I^{LR}_{up}$ is the bicubic interpolation LR-HSI, and $\mathbf X^{HR}$ is the HR normalized 2D coordinate map. $\mathbf z_{spe}$, $\mathbf z_{spa}$, $\mathbf z_{hp}$, $\delta \mathbf x$ correspond to individual pixel units, $\mathcal{A}$ and $\mathcal{P}$ represents amplitude and phase, respectively.
  • Figure 4: $3\times 3$ convolution would suffer from the issue of spectrum leakage, which can be alleviated by $1\times 1$ convolution.
  • Figure 5: Detailed composition of the proposed SFID.
  • ...and 2 more figures