Table of Contents
Fetching ...

WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation

Amogh Joshi, Julian Ost, Felix Heide

Abstract

Unbounded 3D world generation is emerging as a foundational task for scene modeling in computer vision, graphics, and robotics. In this work, we present WorldFlow3D, a novel method capable of generating unbounded 3D worlds. Building upon a foundational property of flow matching - namely, defining a path of transport between two data distributions - we model 3D generation more generally as a problem of flowing through 3D data distributions, not limited to conditional denoising. We find that our latent-free flow approach generates causal and accurate 3D structure, and can use this as an intermediate distribution to guide the generation of more complex structure and high-quality texture - all while converging more rapidly than existing methods. We enable controllability over generated scenes with vectorized scene layout conditions for geometric structure control and visual texture control through scene attributes. We confirm the effectiveness of WorldFlow3D on both real outdoor driving scenes and synthetic indoor scenes, validating cross-domain generalizability and high-quality generation on real data distributions. We confirm favorable scene generation fidelity over approaches in all tested settings for unbounded scene generation. For more, see https://light.princeton.edu/worldflow3d.

WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation

Abstract

Unbounded 3D world generation is emerging as a foundational task for scene modeling in computer vision, graphics, and robotics. In this work, we present WorldFlow3D, a novel method capable of generating unbounded 3D worlds. Building upon a foundational property of flow matching - namely, defining a path of transport between two data distributions - we model 3D generation more generally as a problem of flowing through 3D data distributions, not limited to conditional denoising. We find that our latent-free flow approach generates causal and accurate 3D structure, and can use this as an intermediate distribution to guide the generation of more complex structure and high-quality texture - all while converging more rapidly than existing methods. We enable controllability over generated scenes with vectorized scene layout conditions for geometric structure control and visual texture control through scene attributes. We confirm the effectiveness of WorldFlow3D on both real outdoor driving scenes and synthetic indoor scenes, validating cross-domain generalizability and high-quality generation on real data distributions. We confirm favorable scene generation fidelity over approaches in all tested settings for unbounded scene generation. For more, see https://light.princeton.edu/worldflow3d.

Paper Structure

This paper contains 27 sections, 7 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: WorldFlow3D is a novel method for the generation of unbounded 3D worlds. We show the capabilities of WorldFlow3D for the generation of large-scale outdoor and indoor scenes, with insets showing learned distributions of fine geometric detail and realistic texture.
  • Figure 2: WorldFlow3D decomposes generation into a sequence of independent flows over progressively richer representations --- transporting from noise, through coarse geometry into fine geometry, and visual appearance (\ref{['ssec:flows']}) . All flows operate directly in raw volumetric space (\ref{['ssec:gen']}), enabling a latent-free, hierarchical scene generation procedure. Generation is controlled by a vectorized geometric layout and discrete scene attributes, giving consistent structural and semantic control at every level (\ref{['ssec:control']}).
  • Figure 3: Feather weighted velocity averaging in overlapping chunk regions significantly improves the generated geometry for unbounded generations, as shown above.
  • Figure 4: Qualitative comparison on outdoor scene generation with WorldFlow3D and baseline methods trained on the Waymo sun2020wod dataset. We showcase scenes generated at moderate scales, and closer-up views of specific details including buildings and vehicles. We obtain high-quality, realistic geometry and smooth surfaces with a good amount of detail, as viewed from coherent building structure, smooth road surfaces, and distinct vehicle geometry.
  • Figure 5: Qualitative comparison on indoor scene generation with WorldFlow3D and baseline methods trained on the 3D-FRONT 3dfront dataset. We showcase generations of regions including (potentially multiple) rooms with various objects. Our generations are high-fidelity and contain smooth surfaces and realistic geometry.
  • ...and 2 more figures