Table of Contents
Fetching ...

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

Bohan Zeng, Shanglin Li, Yutang Feng, Ling Yang, Hong Li, Sicheng Gao, Jiaming Liu, Conghui He, Wentao Zhang, Jianzhuang Liu, Baochang Zhang, Shuicheng Yan

TL;DR

IPDreamer tackles appearance control in 3D object generation by leveraging complex image prompts alongside textual prompts. It introduces Image Prompt Score Distillation Sampling (IPSDS) and Mask-guided Compositional Alignment to extract rich appearance features and localize them on 3D meshes. The method uses a NeRF-based mesh optimization pipeline with normal-map prompts and cross-attention to align texture and geometry. Empirical results show IPDreamer achieves higher fidelity and better alignment with prompts than state-of-the-art text-to-3D baselines, and the authors release the code.

Abstract

Recent advances in 3D generation have been remarkable, with methods such as DreamFusion leveraging large-scale text-to-image diffusion-based models to guide 3D object generation. These methods enable the synthesis of detailed and photorealistic textured objects. However, the appearance of 3D objects produced by such text-to-3D models is often unpredictable, and it is hard for single-image-to-3D methods to deal with images lacking a clear subject, complicating the generation of appearance-controllable 3D objects from complex images. To address these challenges, we present IPDreamer, a novel method that captures intricate appearance features from complex $\textbf{I}$mage $\textbf{P}$rompts and aligns the synthesized 3D object with these extracted features, enabling high-fidelity, appearance-controllable 3D object generation. Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts, highlighting its promising capability in appearance-controlled, complex 3D object generation. Our code is available at https://github.com/zengbohan0217/IPDreamer.

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

TL;DR

IPDreamer tackles appearance control in 3D object generation by leveraging complex image prompts alongside textual prompts. It introduces Image Prompt Score Distillation Sampling (IPSDS) and Mask-guided Compositional Alignment to extract rich appearance features and localize them on 3D meshes. The method uses a NeRF-based mesh optimization pipeline with normal-map prompts and cross-attention to align texture and geometry. Empirical results show IPDreamer achieves higher fidelity and better alignment with prompts than state-of-the-art text-to-3D baselines, and the authors release the code.

Abstract

Recent advances in 3D generation have been remarkable, with methods such as DreamFusion leveraging large-scale text-to-image diffusion-based models to guide 3D object generation. These methods enable the synthesis of detailed and photorealistic textured objects. However, the appearance of 3D objects produced by such text-to-3D models is often unpredictable, and it is hard for single-image-to-3D methods to deal with images lacking a clear subject, complicating the generation of appearance-controllable 3D objects from complex images. To address these challenges, we present IPDreamer, a novel method that captures intricate appearance features from complex mage rompts and aligns the synthesized 3D object with these extracted features, enabling high-fidelity, appearance-controllable 3D object generation. Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts, highlighting its promising capability in appearance-controlled, complex 3D object generation. Our code is available at https://github.com/zengbohan0217/IPDreamer.
Paper Structure (32 sections, 10 equations, 12 figures, 4 tables)

This paper contains 32 sections, 10 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: IPDreamer can generate controllable, high-quality 3D objects based on both textual and image prompts. (a) illustrates two high-quality 3D objects with rich details, initialized by the same NeRF model and guided by different complex reference image prompts. (b) demonstrates the 3D synthesis under challenging textual conditions, where our method outperforms existing text-to-3D method wang2023prolificdreamer.
  • Figure 2: IPDreamer is designed to generate high-quality, appearance-controllable 3D meshes that align with single/multiple complex image prompts.
  • Figure 3: Illustration of the effectiveness of Mask-guided Compositional Alignment.
  • Figure 4: Visualization of localization masks.
  • Figure 5: generated 3D objects with different image prompts. (a) Image prompts used for Coarse NeRF model generation. (b) Rendering of Coarse NeRF models. We show four samples for each textual prompt. In each sample, the top left is a selected complex image prompt, and the bottom left and the right illustrate the high-quality 3D object optimized by IPDreamer based on the coarse NeRF model.
  • ...and 7 more figures