MegaScenes: Scene-Level View Synthesis at Scale
Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely
TL;DR
The paper addresses the paucity of diverse scene-level training data for 3D-aware novel view synthesis and demonstrates that finetuning diffusion-based NVS models on MegaScenes substantially improves generalization to in-the-wild scenes. It introduces MegaScenes—a large-scale dataset of ~430K scenes with ~9M images and ~100K SfM reconstructions derived from Wikimedia Commons—paired into over 2M training image pairs with known relative poses. The authors augment prior pose-conditioned diffusion models with warp conditioning that warps the input view into the target view and incorporates extrinsic matrices to enforce correct scale, yielding more pose-consistent and realistic outputs. Across in-domain MegaScenes evaluation and cross-domain tests on DTU, Mip-NeRF360, and RealEstate10K, the approach achieves superior pose alignment and visual fidelity, validating the dataset and method's effectiveness for scene-level NVS.
Abstract
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes with limited pose distributions (DTU, CO3D). In this paper, we create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world. Internet photos represent a scalable data source but come with challenges such as lighting and transient objects. We address these issues to further create a subset suitable for the task of NVS. Additionally, we analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency. Through extensive experiments, we validate the effectiveness of both our dataset and method on generating in-the-wild scenes. For details on the dataset and code, see our project page at https://megascenes.github.io.
