Table of Contents
Fetching ...

Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan

TL;DR

A novel 3D customization method that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes is introduced, and a co-evolution framework to reduce the variance of distributions is designed.

Abstract

Recent years have witnessed the strong power of 3D generation models, which offer a new level of creative flexibility by allowing users to guide the 3D content generation process through a single image or natural language. However, it remains challenging for existing 3D generation methods to create subject-driven 3D content across diverse prompts. In this paper, we introduce a novel 3D customization method, dubbed Make-Your-3D that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes. Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject. Specifically, we design a co-evolution framework to reduce the variance of distributions, where each model undergoes a process of learning from the other through identity-aware optimization and subject-prior optimization, respectively. Extensive experiments demonstrate that our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image.

Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

TL;DR

A novel 3D customization method that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes is introduced, and a co-evolution framework to reduce the variance of distributions is designed.

Abstract

Recent years have witnessed the strong power of 3D generation models, which offer a new level of creative flexibility by allowing users to guide the 3D content generation process through a single image or natural language. However, it remains challenging for existing 3D generation methods to create subject-driven 3D content across diverse prompts. In this paper, we introduce a novel 3D customization method, dubbed Make-Your-3D that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes. Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject. Specifically, we design a co-evolution framework to reduce the variance of distributions, where each model undergoes a process of learning from the other through identity-aware optimization and subject-prior optimization, respectively. Extensive experiments demonstrate that our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image.
Paper Structure (24 sections, 6 equations, 17 figures, 2 tables)

This paper contains 24 sections, 6 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Make-Your-3D can personalize 3D contents from only a single image of a subject with text-driven modifications within only 5 minutes.
  • Figure 2: Distribution variance between the wild subject and pre-trained models. Taking a monkey image and text prompt "with elf ears" as input, the pre-trained 2D personalized model and multi-view diffusion model generate images out of the distribution of desired ones, i.e., the specific monkey with elf ears. To solve the problem, we carefully design a co-evolution framework including subject-prior and identity-aware optimization to harmonize the distributions and achieves desired 3D assets.
  • Figure 3: The overall framework of our proposed Make-Your-3D. Our framework includes identity-aware optimization of 2D personalized model and subject-prior optimization of multi-view diffusion model to approximate subject distribution. The identity-aware optimization (Sec. \ref{['subsec: identity-aware opt']}) lifts input image to 3D space through a frozen multi-view diffusion model and optimizes the 2D personalized model via multi-views. The subject-prior optimization (Sec. \ref{['subsec: subject-prior opt']}) adopts diverse images from frozen personalized model to infuse the subject-specific prior into the multi-view diffusion model.
  • Figure 4: Visual results of Make-Your-3D on different subjects with customized text inputs. The multi-view results demonstrate that our method can generate 3D assets with high-fidelity, 3D consistency, subject preservation, and faithfulness to the text prompts.
  • Figure 5: The qualitative comparisons with DreamBooth3D. We use the same text prompt and only one of the input images as in DreamBooth3D. Notice ours perform better on the object details with less input images.
  • ...and 12 more figures