MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

Chenjie Cao; Chaohui Yu; Shang Liu; Fan Wang; Xiangyang Xue; Yanwei Fu

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

Chenjie Cao, Chaohui Yu, Shang Liu, Fan Wang, Xiangyang Xue, Yanwei Fu

TL;DR

MVGenMaster tackles the challenge of versatile novel view synthesis by fusing diffusion-based generation with explicit 3D priors derived from metric depth and camera geometry. The method employs a multi-view latent diffusion model with Plücker ray embeddings and warped 3D priors (RGB pixels and CCMs) to synthesize many target views from arbitrary reference views in a single forward pass, anchored by a large-scale MvD-1M dataset. Key innovations include a training-free key-rescaling mechanism to extend view numbers without degradation and targeted training strategies (domain switcher, multi-scale training, EMA) that boost scalability and generalization. Empirical results across in-domain and out-of-domain benchmarks demonstrate state-of-the-art NVS performance with improved 3D consistency, extending practical NVS capabilities toward scene-level content and variable-view generation for applications in graphics and AR/VR.

Abstract

We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks. MVGenMaster leverages 3D priors that are warped using metric depth and camera poses, significantly enhancing both generalization and 3D consistency in NVS. Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses with a single forward process. Additionally, we have developed a comprehensive large-scale multi-view image dataset called MvD-1M, comprising up to 1.6 million scenes, equipped with well-aligned metric depth to train MVGenMaster. Moreover, we present several training and model modifications to strengthen the model with scaled-up datasets. Extensive evaluations across in- and out-of-domain benchmarks demonstrate the effectiveness of our proposed method and data formulation. Models and codes will be released at https://github.com/ewrfcas/MVGenMaster/.

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

TL;DR

Abstract

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)