Table of Contents
Fetching ...

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu

TL;DR

Experiments show that DreamComposer is compatible with state-of-the-art diffusion models for zero-shot novel view synthesis, further enhancing them to generate high-fidelity novel view images with multi-view conditions, ready for controllable 3D object reconstruction and various other applications.

Abstract

Utilizing pre-trained 2D large-scale generative models, recent works are capable of generating high-quality novel views from a single in-the-wild image. However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views. In this paper, we present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions. Specifically, DreamComposer first uses a view-aware 3D lifting module to obtain 3D representations of an object from multiple views. Then, it renders the latent features of the target view from 3D representations with the multi-view feature fusion module. Finally the target view features extracted from multi-view inputs are injected into a pre-trained diffusion model. Experiments show that DreamComposer is compatible with state-of-the-art diffusion models for zero-shot novel view synthesis, further enhancing them to generate high-fidelity novel view images with multi-view conditions, ready for controllable 3D object reconstruction and various other applications.

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

TL;DR

Experiments show that DreamComposer is compatible with state-of-the-art diffusion models for zero-shot novel view synthesis, further enhancing them to generate high-fidelity novel view images with multi-view conditions, ready for controllable 3D object reconstruction and various other applications.

Abstract

Utilizing pre-trained 2D large-scale generative models, recent works are capable of generating high-quality novel views from a single in-the-wild image. However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views. In this paper, we present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions. Specifically, DreamComposer first uses a view-aware 3D lifting module to obtain 3D representations of an object from multiple views. Then, it renders the latent features of the target view from 3D representations with the multi-view feature fusion module. Finally the target view features extracted from multi-view inputs are injected into a pre-trained diffusion model. Experiments show that DreamComposer is compatible with state-of-the-art diffusion models for zero-shot novel view synthesis, further enhancing them to generate high-fidelity novel view images with multi-view conditions, ready for controllable 3D object reconstruction and various other applications.
Paper Structure (23 sections, 3 equations, 16 figures, 9 tables)

This paper contains 23 sections, 3 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: DreamComposer is able to generate controllable novel views and 3D objects via injecting multi-view conditions. We incorporate the method into the pipelines of Zero-1-to-3 liu2023zero1to3 and SyncDreamer (SyncD) liu2023syncdreamer to enhance the control ability of those models.
  • Figure 2: An overview pipeline of DreamComposer. Given multiple input images from different views, DreamComposer extracts their 2D latent features and uses a 3D lifting module to produce tri-plane 3D representations. Then, the multi-view condition rendered from 3D representations is injected into the pre-trained diffusion model to provide target-view auxiliary information.
  • Figure 3: Different numbers of ground-truth inputs. Our model is capable of handling a variety of ground-truth input quantities.
  • Figure 4: Qualitative comparisons with Zero-1-to-3 liu2023zero1to3 in controllable novel view synthesis. DC-Zero-1-to-3 effectively generates more controllable images from novel viewpoints by utilizing conditions from multi-view images.
  • Figure 5: Qualitative comparison with SyncDreamer (SyncD) liu2023syncdreamer in controllable novel view synthesis and 3D reconstruction. The image in $\square$ is the main input, and the other image in $\square$ is the conditional input generated from Zero-1-to-3 liu2023zero1to3. With more information in multi-view images, DC-SyncDreamer is able to generate more accurate back textures and more controllable 3D shapes.
  • ...and 11 more figures