Table of Contents
Fetching ...

CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement

Rita Pucci, Niki Martinel

TL;DR

The Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images, achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3 × higher compression efficiency than existing approaches.

Abstract

Unmanned underwater image analysis for marine monitoring faces two key challenges: (i) degraded image quality due to light attenuation and (ii) hardware storage constraints limiting high-resolution image collection. Existing methods primarily address image enhancement with approaches that hinge on storing the full-size input. In contrast, we introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images. Our attention-aware image encoder can project the input image onto a latent space representation while being able to run online on a remote device. The only information that needs to be stored on the device or sent to a beacon is a compressed representation. There is a dual-decoder module that performs offline, full-size enhanced image generation. One branch reconstructs spatial details from the compressed latent space, while the second branch utilizes a capsule-clustering layer to capture entity-level structures and complex spatial relationships. This parallel decoding strategy enables the model to balance fine-detail preservation with context-aware enhancements. CE-VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3x higher compression efficiency than existing approaches. Code available at \url{https://github.com/iN1k1/ce-vae-underwater-image-enhancement}.

CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement

TL;DR

The Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images, achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3 × higher compression efficiency than existing approaches.

Abstract

Unmanned underwater image analysis for marine monitoring faces two key challenges: (i) degraded image quality due to light attenuation and (ii) hardware storage constraints limiting high-resolution image collection. Existing methods primarily address image enhancement with approaches that hinge on storing the full-size input. In contrast, we introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images. Our attention-aware image encoder can project the input image onto a latent space representation while being able to run online on a remote device. The only information that needs to be stored on the device or sent to a beacon is a compressed representation. There is a dual-decoder module that performs offline, full-size enhanced image generation. One branch reconstructs spatial details from the compressed latent space, while the second branch utilizes a capsule-clustering layer to capture entity-level structures and complex spatial relationships. This parallel decoding strategy enables the model to balance fine-detail preservation with context-aware enhancements. CE-VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3x higher compression efficiency than existing approaches. Code available at \url{https://github.com/iN1k1/ce-vae-underwater-image-enhancement}.
Paper Structure (30 sections, 13 equations, 5 figures, 4 tables)

This paper contains 30 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Proposed CE-VAE architecture with the new capsule vector latent space clusterization mechanism.
  • Figure 2: Proposed capsule vector clustering approach. It consists of a capsule layer and a convolutional transpose layer. The capsules extract $\mathbf{U}$ features which are clusterized by the RbA procedure, to obtain $\mathbf{\hat{U}}$. We aggregate the matrices and upsample them by a transposed convolution layer.
  • Figure 3: Analysis of the decoder components. Results are shown for our architecture (i) without the spatial decoder (CE-VAE w/o $D_S$), (ii) without the capsule decoder (CE-VAE w/o $D_c$), (iii) and for the complete CE-VAE.
  • Figure 4: Evaluation of the different loss components and their PSRN impact on the LSUI TestL-400 dataset.
  • Figure 5: Enhanced images comparison on the Color-Check7 dataset.