Table of Contents
Fetching ...

La La LiDAR: Large-Scale Layout Generation from LiDAR Data

Youquan Liu, Lingdong Kong, Weidong Yang, Xin Li, Ao Liang, Runnan Chen, Ben Fei, Tongliang Liu

TL;DR

La La LiDAR tackles the need for controllable 3D LiDAR scene generation by introducing a layout-guided diffusion framework that explicitly models foreground object relations through scene graphs. The two-stage approach first generates semantically-consistent layouts via a scene-graph diffusion process, then synthesizes foreground point clouds and completes the full scene with a Foreground-aware Control Injector that conditions background generation on foreground structure. The work provides two large-scale LiDAR scene graph datasets (Waymo-SG and nuScenes-SG) and new evaluation metrics, and demonstrates state-of-the-art performance in LiDAR layout fidelity, scene realism, and downstream perception tasks such as segmentation, object detection, and completion. This approach enables fine-grained, relation-aware control over driving scenarios, with strong implications for autonomous driving simulation, safety validation, and data augmentation.

Abstract

Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics. While recent diffusion-based models achieve high-fidelity LiDAR generation, they lack explicit control over foreground objects and spatial relationships, limiting their usefulness for scenario simulation and safety validation. To address these limitations, we propose Large-scale Layout-guided LiDAR generation model ("La La LiDAR"), a novel layout-guided generative framework that introduces semantic-enhanced scene graph diffusion with relation-aware contextual conditioning for structured LiDAR layout generation, followed by foreground-aware control injection for complete scene generation. This enables customizable control over object placement while ensuring spatial and semantic consistency. To support our structured LiDAR generation, we introduce Waymo-SG and nuScenes-SG, two large-scale LiDAR scene graph datasets, along with new evaluation metrics for layout synthesis. Extensive experiments demonstrate that La La LiDAR achieves state-of-the-art performance in both LiDAR generation and downstream perception tasks, establishing a new benchmark for controllable 3D scene generation.

La La LiDAR: Large-Scale Layout Generation from LiDAR Data

TL;DR

La La LiDAR tackles the need for controllable 3D LiDAR scene generation by introducing a layout-guided diffusion framework that explicitly models foreground object relations through scene graphs. The two-stage approach first generates semantically-consistent layouts via a scene-graph diffusion process, then synthesizes foreground point clouds and completes the full scene with a Foreground-aware Control Injector that conditions background generation on foreground structure. The work provides two large-scale LiDAR scene graph datasets (Waymo-SG and nuScenes-SG) and new evaluation metrics, and demonstrates state-of-the-art performance in LiDAR layout fidelity, scene realism, and downstream perception tasks such as segmentation, object detection, and completion. This approach enables fine-grained, relation-aware control over driving scenarios, with strong implications for autonomous driving simulation, safety validation, and data augmentation.

Abstract

Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics. While recent diffusion-based models achieve high-fidelity LiDAR generation, they lack explicit control over foreground objects and spatial relationships, limiting their usefulness for scenario simulation and safety validation. To address these limitations, we propose Large-scale Layout-guided LiDAR generation model ("La La LiDAR"), a novel layout-guided generative framework that introduces semantic-enhanced scene graph diffusion with relation-aware contextual conditioning for structured LiDAR layout generation, followed by foreground-aware control injection for complete scene generation. This enables customizable control over object placement while ensuring spatial and semantic consistency. To support our structured LiDAR generation, we introduce Waymo-SG and nuScenes-SG, two large-scale LiDAR scene graph datasets, along with new evaluation metrics for layout synthesis. Extensive experiments demonstrate that La La LiDAR achieves state-of-the-art performance in both LiDAR generation and downstream perception tasks, establishing a new benchmark for controllable 3D scene generation.

Paper Structure

This paper contains 10 sections, 13 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Motivation of customizable LiDAR scene generation from "La La LiDAR". Our framework consists of three key stages: 1)LiDAR layout generation using scene graphs, where nodes represent objects and edges capture their spatial relationships; 2)foreground point cloud synthesis, either by retrieving from a database or by generating conditioned on layout parameters; and 3)foreground-conditioned scene generation, where synthesized foreground serves as conditioning to generate the complete scene with realistic environmental context. This hierarchical approach enables fine-grained control over foreground object placement while maintaining overall scene coherence.
  • Figure 2: The proposed LiDAR point cloud layout generation framework. Our approach begins with scene graph construction, establishing both node embeddings ($o_i$) and edge embeddings ($o_{i\rightarrow j}$) to capture spatial relationships. These are enhanced with semantic features from a CLIP text encoder ($g_i$, $g_{i\rightarrow j}$), creating a comprehensive semantic graph. Graph Encoder then processes this information to produce a latent semantic graph with enriched node representations ($V_i^Z$). During the diffusion process, layout states ($b_i^t$) are iteratively refined through a denoising network that incorporates time-dependent contextual conditioning ($\mathcal{C}_t$), which dynamically aggregates graph features at each timestep. This ensures consistent spatial relationships throughout the denoising process. The final stage synthesizes and places appropriate foreground points according to the generated layout.
  • Figure 3: The schematic definition of the nine relationships (foreground objects) in our LiDAR scene graph formulation.
  • Figure 4: Architecture of our foreground-aware LiDAR scene generation framework. Upper Part: The diffusion-based generation process, where initial Gaussian noise $X_T \sim \mathcal{N}(0,\sigma^2I)$ is progressively denoised to generate the final scene $Z_0$, conditioned on a foreground input $H_0$ via our FCI. Lower Part: The FCI mechanism extracts features from $H_0$ and transforms them into adaptive scale and shift parameters. These modulate the intermediate features $X_f^t$ in the denoising network through channel-wise gating with attention weights $\omega$, resulting in refined features $X_p^t$ that preserve object details. This design ensures spatial coherence and semantic consistency in the generated scene $Z_0$.
  • Figure 5: Qualitative comparisons of La La LiDAR against state-of-the-art LiDAR scene generation approaches on the nuScenes dataset. From left to right: Reference (ground truth), LiDARGen, R2DM, and our method.
  • ...and 1 more figures