Table of Contents
Fetching ...

HyVIC: A Metric-Driven Spatio-Spectral Hyperspectral Image Compression Architecture Based on Variational Autoencoders

Martin Hermann Paul Fuchs, Behnood Rasti, Begüm Demir

Abstract

The rapid growth of hyperspectral data archives in remote sensing (RS) necessitates effective compression methods for storage and transmission. Recent advances in learning-based hyperspectral image (HSI) compression have significantly enhanced both reconstruction fidelity and compression efficiency. However, existing methods typically adapt variational image compression models designed for natural images, without adequately accounting for the distinct spatio-spectral redundancies inherent in HSIs. In particular, they lack explicit architectural designs to balance spatial and spectral feature learning, limiting their ability to effectively leverage the unique characteristics of hyperspectral data. To address this issue, we introduce spatio-spectral variational hyperspectral image compression architecture (HyVIC). The proposed model comprises four main components: 1) adjustable spatio-spectral encoder; 2) spatio-spectral hyperencoder; 3) spatio-spectral hyperdecoder; and 4) adjustable spatio-spectral decoder. We demonstrate that the trade-off between spatial and spectral feature learning is crucial for the reconstruction fidelity, and therefore present a metric-driven strategy to systematically select the hyperparameters of the proposed model. Extensive experiments on two benchmark datasets demonstrate the effectiveness of the proposed model, achieving high spatial and spectral reconstruction fidelity across a wide range of compression ratios (CRs) and improving the state of the art by up to 4.66dB in terms of BD-PSNR. Based on our results, we offer insights and derive practical guidelines to guide future research directions in learning-based variational HSI compression. Our code and pre-trained model weights are publicly available at https://git.tu-berlin.de/rsim/hyvic .

HyVIC: A Metric-Driven Spatio-Spectral Hyperspectral Image Compression Architecture Based on Variational Autoencoders

Abstract

The rapid growth of hyperspectral data archives in remote sensing (RS) necessitates effective compression methods for storage and transmission. Recent advances in learning-based hyperspectral image (HSI) compression have significantly enhanced both reconstruction fidelity and compression efficiency. However, existing methods typically adapt variational image compression models designed for natural images, without adequately accounting for the distinct spatio-spectral redundancies inherent in HSIs. In particular, they lack explicit architectural designs to balance spatial and spectral feature learning, limiting their ability to effectively leverage the unique characteristics of hyperspectral data. To address this issue, we introduce spatio-spectral variational hyperspectral image compression architecture (HyVIC). The proposed model comprises four main components: 1) adjustable spatio-spectral encoder; 2) spatio-spectral hyperencoder; 3) spatio-spectral hyperdecoder; and 4) adjustable spatio-spectral decoder. We demonstrate that the trade-off between spatial and spectral feature learning is crucial for the reconstruction fidelity, and therefore present a metric-driven strategy to systematically select the hyperparameters of the proposed model. Extensive experiments on two benchmark datasets demonstrate the effectiveness of the proposed model, achieving high spatial and spectral reconstruction fidelity across a wide range of compression ratios (CRs) and improving the state of the art by up to 4.66dB in terms of BD-PSNR. Based on our results, we offer insights and derive practical guidelines to guide future research directions in learning-based variational HSI compression. Our code and pre-trained model weights are publicly available at https://git.tu-berlin.de/rsim/hyvic .

Paper Structure

This paper contains 31 sections, 8 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Block diagram of the proposed ours model. Initially, the encoder $E_\Phi$ transforms the input hsi $\mathbf{X}$ into its latent representation $\mathbf{Y}$, which is subsequently transformed by the hyperencoder $E^\mathcal{H}_\Psi$ to the hyperlatent $\mathbf{Z}$. $\mathbf{Y}$ and $\mathbf{Z}$ are quantized to $\mathbf{\hat{Y}}$ and $\mathbf{\hat{Z}}$, compressed into a bitstream and subsequently reconstructed using arithmetic coding. $\mathbf{\hat{Z}}$ serves as side information to estimate both mean $\boldsymbol{\hat{\mu}}$ and scale $\boldsymbol{\hat{\sigma}}$ parameters via the hyperdecoder $D^\mathcal{H}_\Gamma$, which are used inside the gmm-based entropy model to encode $\mathbf{\hat{Y}}$. In conrast, $\mathbf{\hat{Z}}$ is encoded using a fully-factorized entropy model. Finally, the decoder $D_{\Phi'}$ reconstructs $\mathbf{\hat{X}}$ based on $\mathbf{\hat{Y}}$.
  • Figure 2: True color representations of exemplary hsi present in the HySpecNet-11k dataset fuchs2023hyspecnet.
  • Figure 3: True color representations of examplary hsi present in the MLRetSet dataset omruuzun2024novel.
  • Figure 4: Ablation study of kernel size $k$ on (\ref{['fig:kernelsize-psnr']}) reconstruction quality evaluated as psnr, (\ref{['fig:kernelsize-params']}) memory usage in terms of parameter count, and (\ref{['fig:kernelsize-flops']}) computational complexity measured in flops. Results are reported on the HySpecNet-11k fuchs2023hyspecnet test set (easy split) for three $\lambda$ values, while fixing $S = 2.0$, $M = 768.0$, and $N = 460.0$.
  • Figure 5: Ablation study for spatial stages $S$, latent channels $M$, and hyperlatent channels $N$ on the rd performance for the HySpecNet-11k fuchs2023hyspecnet test set (easy split). Rate is visualized as cr in a logarithmic scale and distortion is given as (\ref{['fig:ablation-smn-psnr']}) psnr, (\ref{['fig:ablation-smn-ssim']}) ssim, and (\ref{['fig:ablation-smn-sa']}) sa.
  • ...and 5 more figures