Infinite Leagues Under the Sea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models
Tianyi Zhang, Weiming Zhi, Joshua Mangelson, Matthew Johnson-Roberson
TL;DR
DreamSea tackles the challenge of photorealistic 3D underwater terrain generation from unannotated RGB imagery by conditioning a diffusion model on fractal latent embeddings derived from a Diamond-Square process and on zero-shot features from visual foundation models. It integrates depth inference via Depth Anything v2, RGBD generation, and 3D Gaussian Splatting (3DGS) with Score Distillation Sampling to render consistent novel views, producing large-scale, diverse underwater scenes. Key contributions include fractal latent terrain control, zero-shot feature conditioning with DINOv2 and PCA, and a diffusion-guided RGBD-to-3D pipeline that yields spatially coherent 3D terrains suitable for filming, gaming, and robot simulation. The approach demonstrates robust realism and consistency across multiple real-world datasets while outlining paths toward underwater simulation environments, albeit with limitations in metric scaling and viewing-angle diversity.
Abstract
This paper tackles the problem of generating representations of underwater 3D terrain. Off-the-shelf generative models, trained on Internet-scale data but not on specialized underwater images, exhibit downgraded realism, as images of the seafloor are relatively uncommon. To this end, we introduce DreamSea, a generative model to generate hyper-realistic underwater scenes. DreamSea is trained on real-world image databases collected from underwater robot surveys. Images from these surveys contain massive real seafloor observations and covering large areas, but are prone to noise and artifacts from the real world. We extract 3D geometry and semantics from the data with visual foundation models, and train a diffusion model that generates realistic seafloor images in RGBD channels, conditioned on novel fractal distribution-based latent embeddings. We then fuse the generated images into a 3D map, building a 3DGS model supervised by 2D diffusion priors which allows photorealistic novel view rendering. DreamSea is rigorously evaluated, demonstrating the ability to robustly generate large-scale underwater scenes that are consistent, diverse, and photorealistic. Our work drives impact in multiple domains, spanning filming, gaming, and robot simulation.
