Table of Contents
Fetching ...

One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image

Pengfei Wang, Liyi Chen, Zhiyuan Ma, Yanjun Guo, Guowen Zhang, Lei Zhang

TL;DR

One2Scene is introduced, an effective framework that decomposes this ill-posed problem into three tractable sub-tasks to enable immersive explorable scene generation and works stably under large camera motions, supporting immersive scene exploration.

Abstract

Generating explorable 3D scenes from a single image is a highly challenging problem in 3D vision. Existing methods struggle to support free exploration, often producing severe geometric distortions and noisy artifacts when the viewpoint moves far from the original perspective. We introduce \textbf{One2Scene}, an effective framework that decomposes this ill-posed problem into three tractable sub-tasks to enable immersive explorable scene generation. We first use a panorama generator to produce anchor views from a single input image as initialization. Then, we lift these 2D anchors into an explicit 3D geometric scaffold via a generalizable, feed-forward Gaussian Splatting network. Instead of treating the panorama as a single image for reconstruction, we project it into multiple sparse anchor views and reformulate the reconstruction task as multi-view stereo matching, which allows us to leverage robust geometric priors learned from large-scale multi-view datasets. A bidirectional feature fusion module is used to enforce cross-view consistency, yielding an efficient and geometrically reliable scaffold. Finally, the scaffold serves as a strong prior for a novel view generator to produce photorealistic and geometrically accurate views at arbitrary cameras. By explicitly conditioning on a 3D-consistent scaffold to perform reconstruction, One2Scene works stably under large camera motions, supporting immersive scene exploration. Extensive experiments show that One2Scene substantially outperforms state-of-the-art methods in panorama depth estimation, feed-forward 360° reconstruction, and explorable 3D scene generation. Code and models will be released.

One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image

TL;DR

One2Scene is introduced, an effective framework that decomposes this ill-posed problem into three tractable sub-tasks to enable immersive explorable scene generation and works stably under large camera motions, supporting immersive scene exploration.

Abstract

Generating explorable 3D scenes from a single image is a highly challenging problem in 3D vision. Existing methods struggle to support free exploration, often producing severe geometric distortions and noisy artifacts when the viewpoint moves far from the original perspective. We introduce \textbf{One2Scene}, an effective framework that decomposes this ill-posed problem into three tractable sub-tasks to enable immersive explorable scene generation. We first use a panorama generator to produce anchor views from a single input image as initialization. Then, we lift these 2D anchors into an explicit 3D geometric scaffold via a generalizable, feed-forward Gaussian Splatting network. Instead of treating the panorama as a single image for reconstruction, we project it into multiple sparse anchor views and reformulate the reconstruction task as multi-view stereo matching, which allows us to leverage robust geometric priors learned from large-scale multi-view datasets. A bidirectional feature fusion module is used to enforce cross-view consistency, yielding an efficient and geometrically reliable scaffold. Finally, the scaffold serves as a strong prior for a novel view generator to produce photorealistic and geometrically accurate views at arbitrary cameras. By explicitly conditioning on a 3D-consistent scaffold to perform reconstruction, One2Scene works stably under large camera motions, supporting immersive scene exploration. Extensive experiments show that One2Scene substantially outperforms state-of-the-art methods in panorama depth estimation, feed-forward 360° reconstruction, and explorable 3D scene generation. Code and models will be released.
Paper Structure (23 sections, 14 equations, 11 figures, 6 tables)

This paper contains 23 sections, 14 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Comparison on large-viewpoint novel view synthesis. Existing methods such as Wonderjourny wonderjourney and Dreamscene360 zhou2024dreamscene360 exhibit clear geometric distortions and artifacts, while our method generates photorealistic and geometrically accurate novel views. The input image is highlighted by a red bounding box. The other images represent the novel views.
  • Figure 2: Overview of One2Scene. Our method consists of three stages: (a) an anchor view generation stage to establish an initial 360-degree representation, (b) a feed-forward 3D Gaussian Splatting stage to construct an explicit 3D geometric scaffold, and (c) a synthesis stage that leverages the scaffold information to produce high-quality novel views. The pipeline enables geometrically consistent and photorealistic novel view synthesis from a single input image.
  • Figure 3: Qualitative comparison. Our method retains compelling visual quality and generates plausible continuations of the scene, even under large viewpoint change.
  • Figure 4: Ablation study on reconstruction performance. We compare the 3D scene generation quality by replacing our feedforward network with AnySplat. Top row: reconstruction results. Bottom row: generation results using our model.
  • Figure A1: Qualitative comparison for the ablation study. (a) Render views from our 3D scaffold. (b) Naive concatenation baseline. (c) Ours (Dual-LoRA training only). (d) Ours (Full model with memory condition).
  • ...and 6 more figures