Table of Contents
Fetching ...

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

Qizhou Wang, Li Pang, Xiangyong Cao, Zhiqiang Tian, Deyu Meng

TL;DR

The paper tackles the ill-posed nature of hyperspectral image denoising by learning the conditional distribution $P(X|Y)$ instead of a single deterministic mapping. It introduces HIDFlowNet, a flow-based architecture with a non-invertible conditional encoder to capture global low-frequency information and an invertible decoder to generate local high-frequency details, enabling diverse clean HSI samples by drawing $z\sim p_z$ and applying inverse transforms. The model optimizes a negative log-likelihood term plus a reconstruction loss, leveraging a diagonal Jacobian for efficient training, and demonstrates robust performance on synthetic and real datasets with stable, diverse restorations. This approach provides a principled way to handle denoising as sampling from a conditional distribution, improving detail preservation while offering multiple valid clean reconstructions for the same noisy input.

Abstract

Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, existing deep learning (DL)-based approaches only restore one clean HSI from the given noisy HSI with a deterministic mapping, thus ignoring the ill-posed issue and always resulting in an over-smoothing problem. Additionally, these DL-based methods often neglect that noise is part of the high-frequency component and their network architectures fail to decouple the learning of low-frequency and high-frequency. To alleviate these issues, this paper proposes a flow-based HSI denoising network (HIDFlowNet) to directly learn the conditional distribution of the clean HSI given the noisy HSI and thus diverse clean HSIs can be sampled from the conditional distribution. Overall, our HIDFlowNet is induced from the generative flow model and is comprised of an invertible decoder and a conditional encoder, which can explicitly decouple the learning of low-frequency and high-frequency information of HSI. Specifically, the invertible decoder is built by staking a succession of invertible conditional blocks (ICBs) to capture the local high-frequency details. The conditional encoder utilizes down-sampling operations to obtain low-resolution images and uses transformers to capture correlations over a long distance so that global low-frequency information can be effectively extracted. Extensive experiments on simulated and real HSI datasets verify that our proposed HIDFlowNet can obtain better or comparable results compared with other state-of-the-art methods.

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

TL;DR

The paper tackles the ill-posed nature of hyperspectral image denoising by learning the conditional distribution instead of a single deterministic mapping. It introduces HIDFlowNet, a flow-based architecture with a non-invertible conditional encoder to capture global low-frequency information and an invertible decoder to generate local high-frequency details, enabling diverse clean HSI samples by drawing and applying inverse transforms. The model optimizes a negative log-likelihood term plus a reconstruction loss, leveraging a diagonal Jacobian for efficient training, and demonstrates robust performance on synthetic and real datasets with stable, diverse restorations. This approach provides a principled way to handle denoising as sampling from a conditional distribution, improving detail preservation while offering multiple valid clean reconstructions for the same noisy input.

Abstract

Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, existing deep learning (DL)-based approaches only restore one clean HSI from the given noisy HSI with a deterministic mapping, thus ignoring the ill-posed issue and always resulting in an over-smoothing problem. Additionally, these DL-based methods often neglect that noise is part of the high-frequency component and their network architectures fail to decouple the learning of low-frequency and high-frequency. To alleviate these issues, this paper proposes a flow-based HSI denoising network (HIDFlowNet) to directly learn the conditional distribution of the clean HSI given the noisy HSI and thus diverse clean HSIs can be sampled from the conditional distribution. Overall, our HIDFlowNet is induced from the generative flow model and is comprised of an invertible decoder and a conditional encoder, which can explicitly decouple the learning of low-frequency and high-frequency information of HSI. Specifically, the invertible decoder is built by staking a succession of invertible conditional blocks (ICBs) to capture the local high-frequency details. The conditional encoder utilizes down-sampling operations to obtain low-resolution images and uses transformers to capture correlations over a long distance so that global low-frequency information can be effectively extracted. Extensive experiments on simulated and real HSI datasets verify that our proposed HIDFlowNet can obtain better or comparable results compared with other state-of-the-art methods.
Paper Structure (29 sections, 10 equations, 8 figures, 7 tables)

This paper contains 29 sections, 10 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Instead of performing HSI denoising with a deterministic mapping, our HIDFlowNet learns the conditional distribution of clean HSI given corresponding noisy counterpart, which explicitly alleviates the ill-posed nature of HSI denoising and enables us to sample diverse clean HSIs. The charts on the right demonstrate that the reconstructed spectral reflectance of our HIDFlowNet is more consistent with the ground truth than that of other approaches (the spectral RMSE of our method is 0.05 while that of the second-best method QRNN3D is 0.07).
  • Figure 2: The network architecture of HIDFlowNet includes a conditional encoder (yellow) and an invertible decoder (blue). The encoder takes the noisy HSI as input and generates multiple-scale feature maps with a series of transformer blocks and down-sampling operations. The invertible decoder transforms a latent representation which conforms to a simple distribution (e.g., a Gaussian distribution) into high-frequency information utilizing a succession of invertible conditional blocks with the guidance of the encoder. Finally, the low and high-frequency parts are merged to restore clean HSI. The whole framework is trained by minimizing the negative log-likelihood and reconstruction loss, and then can predict diverse clean HSIs during the inference stage.
  • Figure 3: The invertible conditional block is composed of an invertible conditional affine layer and a residual invertible convolution layer. The feature map of the encoder $\textbf{t}^n$ is processed through an upsampling layer and a HinCa Block to generate the scale and bias terms of the affine transform. And then the output $\textbf{h}^{n+1}$ is generated by performing an invertible convolution.
  • Figure 4: The details of HinCaBlock which consists of a half instance normalization block and a channel attention layer.
  • Figure 5: Visual comparison of denoising results on the 104th band in Urban dataset.
  • ...and 3 more figures