DISC-GAN: Disentangling Style and Content for Cluster-Specific Synthetic Underwater Image Generation
Sneha Varur, Anirudh R Hanchinamani, Tarun S Bagewadi, Uma Mudenagudi, Chaitra D Desai, Sujata C, Padmashree Desai, Sumit Meharwade
TL;DR
Underwater image synthesis is challenged by depth-dependent attenuation and scattering, leading to color distortion and haze. The authors present DISC-GAN, a disentangled style-content GAN with cluster-specific training that partitions data into four Jerlov-inspired style domains and uses AdaIN to transfer underwater style while preserving content. On the RSUIGM dataset, DISC-GAN achieves high SSIM (~0.90), PSNR (~32.5 dB), and low FID (~3.86–8.37) across clusters, approaching physics-grounded ground truth without explicit priors. This approach provides fine-grained control over underwater appearance and offers a practical data augmentation tool for marine robotics and related tasks.
Abstract
In this paper, we propose a novel framework, Disentangled Style-Content GAN (DISC-GAN), which integrates style-content disentanglement with a cluster-specific training strategy towards photorealistic underwater image synthesis. The quality of synthetic underwater images is challenged by optical due to phenomena such as color attenuation and turbidity. These phenomena are represented by distinct stylistic variations across different waterbodies, such as changes in tint and haze. While generative models are well-suited to capture complex patterns, they often lack the ability to model the non-uniform conditions of diverse underwater environments. To address these challenges, we employ K-means clustering to partition a dataset into style-specific domains. We use separate encoders to get latent spaces for style and content; we further integrate these latent representations via Adaptive Instance Normalization (AdaIN) and decode the result to produce the final synthetic image. The model is trained independently on each style cluster to preserve domain-specific characteristics. Our framework demonstrates state-of-the-art performance, obtaining a Structural Similarity Index (SSIM) of 0.9012, an average Peak Signal-to-Noise Ratio (PSNR) of 32.5118 dB, and a Frechet Inception Distance (FID) of 13.3728.
