Table of Contents
Fetching ...

Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

Dongsu Zhang, Francis Williams, Zan Gojcic, Karsten Kreis, Sanja Fidler, Young Min Kim, Amlan Kar

TL;DR

This work tackles the challenge of generating high-fidelity, large-scale outdoor 3D scenes from sparse LiDAR data for simulation purposes. It introduces hierarchical Generative Cellular Automata (hGCA), a two-stage, coarse-to-fine model where a coarse GCA conditioned by a light-weight BEV planner creates a global, low-resolution layout, followed by a high-resolution upsampling stage using a continuous GCA with local implicit surfaces to produce a detailed mesh. Across synthetic datasets and real-world Waymo data, hGCA demonstrates superior extrapolation fidelity and sim-to-real generalization relative to semantic scene completion and indoor-scene baselines, while also producing novel content guided by geometric cues. The approach promises scalable, simulation-ready environment generation from AV sensing data, though it currently omits textures and has slower runtimes, signaling avenues for future optimization and realism enhancements.

Abstract

We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Generative Cellular Automata (hGCA), a spatially scalable conditional 3D generative model, which grows geometry recursively with local kernels following, in a coarse-to-fine manner, equipped with a light-weight planner to induce global consistency. Experiments on synthetic scenes show that hGCA generates plausible scene geometry with higher fidelity and completeness compared to state-of-the-art baselines. Our model generalizes strongly from sim-to-real, qualitatively outperforming baselines on the Waymo-open dataset. We also show anecdotal evidence of the ability to create novel objects from real-world geometric cues even when trained on limited synthetic content. More results and details can be found on https://research.nvidia.com/labs/toronto-ai/hGCA/.

Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

TL;DR

This work tackles the challenge of generating high-fidelity, large-scale outdoor 3D scenes from sparse LiDAR data for simulation purposes. It introduces hierarchical Generative Cellular Automata (hGCA), a two-stage, coarse-to-fine model where a coarse GCA conditioned by a light-weight BEV planner creates a global, low-resolution layout, followed by a high-resolution upsampling stage using a continuous GCA with local implicit surfaces to produce a detailed mesh. Across synthetic datasets and real-world Waymo data, hGCA demonstrates superior extrapolation fidelity and sim-to-real generalization relative to semantic scene completion and indoor-scene baselines, while also producing novel content guided by geometric cues. The approach promises scalable, simulation-ready environment generation from AV sensing data, though it currently omits textures and has slower runtimes, signaling avenues for future optimization and realism enhancements.

Abstract

We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Generative Cellular Automata (hGCA), a spatially scalable conditional 3D generative model, which grows geometry recursively with local kernels following, in a coarse-to-fine manner, equipped with a light-weight planner to induce global consistency. Experiments on synthetic scenes show that hGCA generates plausible scene geometry with higher fidelity and completeness compared to state-of-the-art baselines. Our model generalizes strongly from sim-to-real, qualitatively outperforming baselines on the Waymo-open dataset. We also show anecdotal evidence of the ability to create novel objects from real-world geometric cues even when trained on limited synthetic content. More results and details can be found on https://research.nvidia.com/labs/toronto-ai/hGCA/.
Paper Structure (33 sections, 11 equations, 24 figures, 9 tables)

This paper contains 33 sections, 11 equations, 24 figures, 9 tables.

Figures (24)

  • Figure 1: Geometry generation from hGCA (blue) from five accumulated LiDAR scans (yellow spheres) on real-world Waymo-open dataset. hGCA is a conditional 3D generative model that can generate geometry beyond occlusions (vehicles, facades) and input field of view (roofs, trees, poles), from sparse and noisy LiDAR scans. Our method is also spatially scalable, completing this whole scene (120 meters) at high resolution on a single 24GB GPU without additional tricks.
  • Figure 2: Overview of our method. Given several LiDAR scans, our method generates low resolution completion $s^{T_1}$ using a GCA attached with a planner that adds global consistency. Then given $s^{T_1}$ and the input, we upsample the completion using a cGCA into high resolution voxel with a local latent $x^{T_2}$ and decode it to obtain the final generated mesh.
  • Figure 3: Left: (a) Input LiDAR scans. (b), (c) GCA completion in $\text{10cm}^3$ and $\text{20cm}^3$ voxel resolution. (d) GCA + planner completion in 20cm voxel resolution. GCA is local and often cannot capture the global context, generating imperfect completions (pink box) or artifacts (green box).
  • Figure 4: Illustration of GCA attached with planner module.
  • Figure 5: Visualizations on CARLA (first 2 columns) and Karton City (last 2 columns) from 5 scans. hGCA generates high-resoluton geometry beyond field of view (bus stops, trees, roofs) and occlusions (cars) compared to existing baselines. Deterministic baselines tend to conservatively complete high-confidence regions near the input. Green boxes demonstrate inconsistency of building interiors in GT data.
  • ...and 19 more figures