IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
Bohan Zeng, Shanglin Li, Yutang Feng, Ling Yang, Hong Li, Sicheng Gao, Jiaming Liu, Conghui He, Wentao Zhang, Jianzhuang Liu, Baochang Zhang, Shuicheng Yan
TL;DR
IPDreamer tackles appearance control in 3D object generation by leveraging complex image prompts alongside textual prompts. It introduces Image Prompt Score Distillation Sampling (IPSDS) and Mask-guided Compositional Alignment to extract rich appearance features and localize them on 3D meshes. The method uses a NeRF-based mesh optimization pipeline with normal-map prompts and cross-attention to align texture and geometry. Empirical results show IPDreamer achieves higher fidelity and better alignment with prompts than state-of-the-art text-to-3D baselines, and the authors release the code.
Abstract
Recent advances in 3D generation have been remarkable, with methods such as DreamFusion leveraging large-scale text-to-image diffusion-based models to guide 3D object generation. These methods enable the synthesis of detailed and photorealistic textured objects. However, the appearance of 3D objects produced by such text-to-3D models is often unpredictable, and it is hard for single-image-to-3D methods to deal with images lacking a clear subject, complicating the generation of appearance-controllable 3D objects from complex images. To address these challenges, we present IPDreamer, a novel method that captures intricate appearance features from complex $\textbf{I}$mage $\textbf{P}$rompts and aligns the synthesized 3D object with these extracted features, enabling high-fidelity, appearance-controllable 3D object generation. Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts, highlighting its promising capability in appearance-controlled, complex 3D object generation. Our code is available at https://github.com/zengbohan0217/IPDreamer.
