DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

Paul Yoo; Jiaxian Guo; Yutaka Matsuo; Shixiang Shane Gu

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang Shane Gu

TL;DR

DreamSparse tackles sparse-view novel-view synthesis by leveraging frozen 2D diffusion priors and injecting 3D awareness through a dedicated geometry module. It introduces a spatial guidance mechanism to translate 3D features into diffusion-input guidance without tuning the diffusion model, augmented by noise perturbation to preserve identity. The approach achieves state-of-the-art results on CO3D for both object- and scene-level NVS, including strong open-set generalization and support for textual style control. These results highlight the practical potential of combining powerful 2D priors with lightweight 3D priors for efficient, generalizable NVS from very few views.

Abstract

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

TL;DR

Abstract

Paper Structure (33 sections, 6 equations, 8 figures, 8 tables)

This paper contains 33 sections, 6 equations, 8 figures, 8 tables.

Introduction
Related Works
Geometry-based Novel View Synthesis.
Sparse-view 3D Reconstruction.
Diffusion Model for 3D Reconstruction
Method
3D Geometry Module
Spatial Guidance Module
Noise Perturbation
Experiments
Dataset and Training Details
Competing Methods
Main Results Analysis
Object Level Novel View Synthesis
In-Domain Evaluation
...and 18 more sections

Figures (8)

Figure 1: Qualitative results on novel view synthesis of real-world objects from the CO3D dataset.
Figure 2: The illustration of the method. The first stage involves utilizing a 3D geometry module to estimate 3D structure and aggregate features from context views. In the next stage, a pre-trained 2D diffusion model conditioned on the aggregate features is leveraged to learn a spatial guidance model that guides the diffusion process for accurate synthesis of the underlying object.
Figure 3: Novel view synthesizing results on open-set category objects with the same context image inputs, where SF denotes SparseFusion zhou2023sparsefusion and GT denotes Ground-Truth image. More results are given at our project webpage and appendix.
Figure 4: Qualitative results of scene-level novel view synthesiseNVS outputs from all baselines.
Figure 5: Qualitative results of novel view synthesise with textual control style transfer
...and 3 more figures

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

TL;DR

Abstract

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

Authors

TL;DR

Abstract

Table of Contents

Figures (8)