Table of Contents
Fetching ...

DISC-GAN: Disentangling Style and Content for Cluster-Specific Synthetic Underwater Image Generation

Sneha Varur, Anirudh R Hanchinamani, Tarun S Bagewadi, Uma Mudenagudi, Chaitra D Desai, Sujata C, Padmashree Desai, Sumit Meharwade

TL;DR

Underwater image synthesis is challenged by depth-dependent attenuation and scattering, leading to color distortion and haze. The authors present DISC-GAN, a disentangled style-content GAN with cluster-specific training that partitions data into four Jerlov-inspired style domains and uses AdaIN to transfer underwater style while preserving content. On the RSUIGM dataset, DISC-GAN achieves high SSIM (~0.90), PSNR (~32.5 dB), and low FID (~3.86–8.37) across clusters, approaching physics-grounded ground truth without explicit priors. This approach provides fine-grained control over underwater appearance and offers a practical data augmentation tool for marine robotics and related tasks.

Abstract

In this paper, we propose a novel framework, Disentangled Style-Content GAN (DISC-GAN), which integrates style-content disentanglement with a cluster-specific training strategy towards photorealistic underwater image synthesis. The quality of synthetic underwater images is challenged by optical due to phenomena such as color attenuation and turbidity. These phenomena are represented by distinct stylistic variations across different waterbodies, such as changes in tint and haze. While generative models are well-suited to capture complex patterns, they often lack the ability to model the non-uniform conditions of diverse underwater environments. To address these challenges, we employ K-means clustering to partition a dataset into style-specific domains. We use separate encoders to get latent spaces for style and content; we further integrate these latent representations via Adaptive Instance Normalization (AdaIN) and decode the result to produce the final synthetic image. The model is trained independently on each style cluster to preserve domain-specific characteristics. Our framework demonstrates state-of-the-art performance, obtaining a Structural Similarity Index (SSIM) of 0.9012, an average Peak Signal-to-Noise Ratio (PSNR) of 32.5118 dB, and a Frechet Inception Distance (FID) of 13.3728.

DISC-GAN: Disentangling Style and Content for Cluster-Specific Synthetic Underwater Image Generation

TL;DR

Underwater image synthesis is challenged by depth-dependent attenuation and scattering, leading to color distortion and haze. The authors present DISC-GAN, a disentangled style-content GAN with cluster-specific training that partitions data into four Jerlov-inspired style domains and uses AdaIN to transfer underwater style while preserving content. On the RSUIGM dataset, DISC-GAN achieves high SSIM (~0.90), PSNR (~32.5 dB), and low FID (~3.86–8.37) across clusters, approaching physics-grounded ground truth without explicit priors. This approach provides fine-grained control over underwater appearance and offers a practical data augmentation tool for marine robotics and related tasks.

Abstract

In this paper, we propose a novel framework, Disentangled Style-Content GAN (DISC-GAN), which integrates style-content disentanglement with a cluster-specific training strategy towards photorealistic underwater image synthesis. The quality of synthetic underwater images is challenged by optical due to phenomena such as color attenuation and turbidity. These phenomena are represented by distinct stylistic variations across different waterbodies, such as changes in tint and haze. While generative models are well-suited to capture complex patterns, they often lack the ability to model the non-uniform conditions of diverse underwater environments. To address these challenges, we employ K-means clustering to partition a dataset into style-specific domains. We use separate encoders to get latent spaces for style and content; we further integrate these latent representations via Adaptive Instance Normalization (AdaIN) and decode the result to produce the final synthetic image. The model is trained independently on each style cluster to preserve domain-specific characteristics. Our framework demonstrates state-of-the-art performance, obtaining a Structural Similarity Index (SSIM) of 0.9012, an average Peak Signal-to-Noise Ratio (PSNR) of 32.5118 dB, and a Frechet Inception Distance (FID) of 13.3728.

Paper Structure

This paper contains 14 sections, 8 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: The high-level design of our proposed DISC-GAN framework. DISC-GAN is trained using clusters obtained from image patches derived from the Jerlov water classification scheme, originally introduced by Jerlov in "Marine Optics" (1976 jerlov1976marine). The visualization of the Jerlov water types and the associated example water patch are reproduced from Desai et al., "Realistic Synthetic Underwater Image Generation with Image Formation Model" (2024 desai2022rsuigmdataset). This visualization originally appeared in Akkaynak and Treibitz, "What Is the Space of Attenuation Coefficients in Underwater Computer Vision?" (2017 Akkaynak_2017_CVPR).
  • Figure 2: Example patches from the RSUIGM dataset used for style clustering.
  • Figure 3: The Elbow Method plot for K-means clustering on the 200 classes of Jerlov considering the rbg values for each class. The "elbow" point at k=4 suggests it is the optimal number of clusters.
  • Figure 4: Visual comparison of K-means clustering results for k=3, 4, 5, and 6. The top row compares k=3 and k=4 (optimal), while the bottom row compares k=5 and k=6. Each pair shows the 3D RGB plot and its corresponding clustered jerlov classes. The results for k=4 show the most visually coherent and distinct style separation.
  • Figure 5: A content feature vector is extracted from the input image using the conv4_2 layer of a VGG19 encoder, preserving high-level structural information.
  • ...and 2 more figures