Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Lu Sheng
TL;DR
Ouroboros3D tackles the data bias and cross-view inconsistency inherent in two-stage image-to-3D pipelines by unifying multi-view diffusion and 3D reconstruction into a recursive diffusion framework. It introduces a 3D-aware feedback loop and a self-conditioning strategy to jointly train a diffusion-based multi-view generator (SVD) and a feed-forward 3D reconstructor (LGM), achieving improved geometric consistency and high-quality 3D outputs from a single image. The approach demonstrates superior performance over stage-separated pipelines and inference-time fusion methods on both multi-view and 3D reconstruction tasks, with notable gains in PSNR, SSIM, and LPIPS on standard benchmarks. This framework is extensible to different 3D representations and holds practical potential for rapid, single-image-to-3D content creation with reduced data bias.
Abstract
Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency.Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase. Project page: https://costwen.github.io/Ouroboros3D/
