Table of Contents
Fetching ...

WonderJourney: Going from Anywhere to Everywhere

Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

TL;DR

WonderJourney introduces a modular pipeline for perpetual 3D scene generation, enabling a user to start from any location via text or image and traverse through a long sequence of diverse yet coherent scenes. It combines an LLM for scene descriptions, a text-driven visual module to generate colored 3D point clouds, and a VLM for validation with regeneration capabilities. The approach addresses depth continuity, boundary artifacts, and disocclusion through depth refinement, perspective unprojection, and outpainting guided by scene descriptions. Experimental results show diverse, high-quality journeys that outperform baselines like InfiniteNature-Zero and SceneScape in human studies. The work offers a flexible, training-free framework that can leverage advancing language and vision models for creative 3D content generation.

Abstract

We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/

WonderJourney: Going from Anywhere to Everywhere

TL;DR

WonderJourney introduces a modular pipeline for perpetual 3D scene generation, enabling a user to start from any location via text or image and traverse through a long sequence of diverse yet coherent scenes. It combines an LLM for scene descriptions, a text-driven visual module to generate colored 3D point clouds, and a VLM for validation with regeneration capabilities. The approach addresses depth continuity, boundary artifacts, and disocclusion through depth refinement, perspective unprojection, and outpainting guided by scene descriptions. Experimental results show diverse, high-quality journeys that outperform baselines like InfiniteNature-Zero and SceneScape in human studies. The work offers a flexible, training-free framework that can leverage advancing language and vision models for creative 3D content generation.

Abstract

We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/
Paper Structure (17 sections, 7 equations, 13 figures, 1 table)

This paper contains 17 sections, 7 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: We propose WonderJourney---generating a sequence of diverse yet coherent 3D scenes from a text description or an arbitrary image such as photos or generated art ("from anywhere"). WonderJourney can generate various journeys (which we refer to as "wonderjourneys") for a fixed input, potentially ending "everywhere" (Fig. \ref{['fig:diverse_results']}). We show rendered images along the generated sequence of 3D scenes. We strongly encourage the reader to see video examples at https://kovenyu.com/WonderJourney/.
  • Figure 2: The proposed WonderJourney framework and workflow across modules. Our modular design does not require any training, allowing easy future improvements from the quick advances in vision and language models.
  • Figure 3: The visual scene generation module. Each arrow represents a parametric vision model (e.g., a depth estimator) or an operation (e.g., rendering). Our fully modular design easily benefits from advances in the corresponding research topics.
  • Figure 4: Qualitative results for diverse journeys generated from the same input image, showing that WonderJourney can go everywhere. The input in the top example is a real photo.
  • Figure 5: From diverse starting scenes with different styles, WonderJourney generates a sequence of diverse yet coherent 3D scenes, showing that it can go from anywhere to everywhere (e.g., nature, village, city, indoor, or fantasy). The inputs in top two rows are real photos. We strongly encourage the reader to see the video results in the project website.
  • ...and 8 more figures