WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann
TL;DR
WonderJourney introduces a modular pipeline for perpetual 3D scene generation, enabling a user to start from any location via text or image and traverse through a long sequence of diverse yet coherent scenes. It combines an LLM for scene descriptions, a text-driven visual module to generate colored 3D point clouds, and a VLM for validation with regeneration capabilities. The approach addresses depth continuity, boundary artifacts, and disocclusion through depth refinement, perspective unprojection, and outpainting guided by scene descriptions. Experimental results show diverse, high-quality journeys that outperform baselines like InfiniteNature-Zero and SceneScape in human studies. The work offers a flexible, training-free framework that can leverage advancing language and vision models for creative 3D content generation.
Abstract
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/
