GECO: Generative Image-to-3D within a SECOnd
Chen Wang, Jiatao Gu, Xiaoxiao Long, Yuan Liu, Lingjie Liu
TL;DR
GECO addresses the bottleneck and uncertainty of image-to-3D generation by combining a two-stage distillation pipeline. Stage I distills a pretrained multi-view diffusion model into a one-step multi-view generator, while Stage II uses pseudo ground-truth from this MV output to fine-tune a reconstruction-based 3D model for cross-view consistency. The result is a feed-forward 3D generator that delivers high-quality textured meshes in under a second on a single GPU, outperforming prior fast methods in both texture and geometry accuracy. This approach enables practical, real-time 3D asset creation from a single image with robust handling of viewpoint uncertainty.
Abstract
Recent years have seen significant advancements in 3D generation. While methods like score distillation achieve impressive results, they often require extensive per-scene optimization, which limits their time efficiency. On the other hand, reconstruction-based approaches are more efficient but tend to compromise quality due to their limited ability to handle uncertainty. We introduce GECO, a novel method for high-quality 3D generative modeling that operates within a second. Our approach addresses the prevalent issues of uncertainty and inefficiency in existing methods through a two-stage approach. In the first stage, we train a single-step multi-view generative model with score distillation. Then, a second-stage distillation is applied to address the challenge of view inconsistency in the multi-view generation. This two-stage process ensures a balanced approach to 3D generation, optimizing both quality and efficiency. Our comprehensive experiments demonstrate that GECO achieves high-quality image-to-3D mesh generation with an unprecedented level of efficiency. We will make the code and model publicly available.
