Table of Contents
Fetching ...

NavCrafter: Exploring 3D Scenes from a Single Image

Hongbo Duan, Peiyu Zhuang, Yi Liu, Zhengyang Zhang, Yuxin Zhang, Pengting Luo, Fangming Liu, Xueqian Wang

Abstract

Creating flexible 3D scenes from a single image is vital when direct 3D data acquisition is costly or impractical. We introduce NavCrafter, a novel framework that explores 3D scenes from a single image by synthesizing novel-view video sequences with camera controllability and temporal-spatial consistency. NavCrafter leverages video diffusion models to capture rich 3D priors and adopts a geometry-aware expansion strategy to progressively extend scene coverage. To enable controllable multi-view synthesis, we introduce a multi-stage camera control mechanism that conditions diffusion models with diverse trajectories via dual-branch camera injection and attention modulation. We further propose a collision-aware camera trajectory planner and an enhanced 3D Gaussian Splatting (3DGS) pipeline with depth-aligned supervision, structural regularization and refinement. Extensive experiments demonstrate that NavCrafter achieves state-of-the-art novel-view synthesis under large viewpoint shifts and substantially improves 3D reconstruction fidelity.

NavCrafter: Exploring 3D Scenes from a Single Image

Abstract

Creating flexible 3D scenes from a single image is vital when direct 3D data acquisition is costly or impractical. We introduce NavCrafter, a novel framework that explores 3D scenes from a single image by synthesizing novel-view video sequences with camera controllability and temporal-spatial consistency. NavCrafter leverages video diffusion models to capture rich 3D priors and adopts a geometry-aware expansion strategy to progressively extend scene coverage. To enable controllable multi-view synthesis, we introduce a multi-stage camera control mechanism that conditions diffusion models with diverse trajectories via dual-branch camera injection and attention modulation. We further propose a collision-aware camera trajectory planner and an enhanced 3D Gaussian Splatting (3DGS) pipeline with depth-aligned supervision, structural regularization and refinement. Extensive experiments demonstrate that NavCrafter achieves state-of-the-art novel-view synthesis under large viewpoint shifts and substantially improves 3D reconstruction fidelity.

Paper Structure

This paper contains 38 sections, 14 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Visual results generated by NavCrafter. Given a single image, NavCrafter reconstructs 3D scenes from the camera-guided video diffusion model.
  • Figure B1: The NavCrafter framework consists of three modules: (1) Controllable novel-view synthesis via video diffusion, integrating camera trajectories to control video generation and achieve temporally consistent novel views; (2) Iterative view synthesis with collision-aware camera trajectory planning, avoiding scene collisions and optimizing camera trajectories; (3) Geometry-aware 3D reconstruction with enhanced 3D Gaussian Splatting, incorporating depth-aligned supervision, structural regularization and image diffusion model refinement.
  • Figure D1: Qualitative comparison with prior methods in controllable novel view synthesis, where the first column shows the input image and camera trajectory. Blue bounding boxes indicate reference areas for easier comparison, while orange ones highlight low-quality generations.
  • Figure E1: Qualitative comparison with prior methods in 3D scene reconstruction, where blue bounding boxes show visible regions derived from input image and yellow bounding boxes highlight low-quality regions.
  • Figure E2: Comparison of reconstruction quality between Ours and ViewCrafter.
  • ...and 1 more figures