Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion
Fan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu Zhang, Huichao Zhang, Xiaofeng Yang, Xiu Li, Jiashi Feng, Guosheng Lin
TL;DR
Magic-Boost tackles instability and low detail in 3D generation by introducing a multi-view conditioned diffusion model that leverages pseudo multi-view priors to guide a fast SDS refinement. A time-fixed local feature extractor, cross-view 3D attention, and data augmentation enable robust extraction of 3D priors from inconsistent views, while an Anchor Iterative Update loss stabilizes the refinement. The pipeline converts coarse 3D inputs (e.g., Instant3D) into differentiable representations (via fast NeRF) and optimizes with SDS over a short horizon, achieving high-fidelity geometry and textures in around 15 minutes. Empirical results on image-to-3D generation and novel view synthesis demonstrate improved quality, stronger identity preservation, and faster inference, with the method being plug-in compatible with various pseudo multi-view priors and backbones.
Abstract
Benefiting from the rapid development of 2D diffusion models, 3D content generation has witnessed significant progress. One promising solution is to finetune the pre-trained 2D diffusion models to produce multi-view images and then reconstruct them into 3D assets via feed-forward sparse-view reconstruction models. However, limited by the 3D inconsistency in the generated multi-view images and the low reconstruction resolution of the feed-forward reconstruction models, the generated 3d assets are still limited to incorrect geometries and blurry textures. To address this problem, we present a multi-view based refine method, named Magic-Boost, to further refine the generation results. In detail, we first propose a novel multi-view conditioned diffusion model which extracts 3d prior from the synthesized multi-view images to synthesize high-fidelity novel view images and then introduce a novel iterative-update strategy to adopt it to provide precise guidance to refine the coarse generated results through a fast optimization process. Conditioned on the strong 3d priors extracted from the synthesized multi-view images, Magic-Boost is capable of providing precise optimization guidance that well aligns with the coarse generated 3D assets, enriching the local detail in both geometry and texture within a short time ($\sim15$min). Extensive experiments show Magic-Boost greatly enhances the coarse generated inputs, generates high-quality 3D assets with rich geometric and textural details. (Project Page: https://magic-research.github.io/magic-boost/)
