Table of Contents
Fetching ...

LidarPainter: One-Step Away From Any Lidar View To Novel Guidance

Yuzhou Ji, Ke Ma, Hong Cai, Anchun Zhang, Lizhuang Ma, Xin Tan

TL;DR

This work tackles the challenge of reconstructing dynamic driving scenes when novel viewpoints diverge from the original trajectory, which often corrupts background and vehicle details. It introduces LidarPainter, a one-step diffusion model that conditions image refinement on LiDAR renderings and artifact-prone views, using a Latent Attention Fusion module to merge structural guidance with high-frequency detail. The approach leverages a dynamic Gaussian Splatting representation, enabling faster, more memory-efficient generation than existing video-diffusion baselines and supporting stylized, text-driven variations. The method delivers superior quality, consistency, and speed, with practical impact for digital twins and autonomous-driving simulations, and it demonstrates strong potential for broader downstream tasks through ablations and stylization experiments.

Abstract

Dynamic driving scene reconstruction is of great importance in fields like digital twin system and autonomous driving simulation. However, unacceptable degradation occurs when the view deviates from the input trajectory, leading to corrupted background and vehicle models. To improve reconstruction quality on novel trajectory, existing methods are subject to various limitations including inconsistency, deformation, and time consumption. This paper proposes LidarPainter, a one-step diffusion model that recovers consistent driving views from sparse LiDAR condition and artifact-corrupted renderings in real-time, enabling high-fidelity lane shifts in driving scene reconstruction. Extensive experiments show that LidarPainter outperforms state-of-the-art methods in speed, quality and resource efficiency, specifically 7 x faster than StreetCrafter with only one fifth of GPU memory required. LidarPainter also supports stylized generation using text prompts such as "foggy" and "night", allowing for a diverse expansion of the existing asset library.

LidarPainter: One-Step Away From Any Lidar View To Novel Guidance

TL;DR

This work tackles the challenge of reconstructing dynamic driving scenes when novel viewpoints diverge from the original trajectory, which often corrupts background and vehicle details. It introduces LidarPainter, a one-step diffusion model that conditions image refinement on LiDAR renderings and artifact-prone views, using a Latent Attention Fusion module to merge structural guidance with high-frequency detail. The approach leverages a dynamic Gaussian Splatting representation, enabling faster, more memory-efficient generation than existing video-diffusion baselines and supporting stylized, text-driven variations. The method delivers superior quality, consistency, and speed, with practical impact for digital twins and autonomous-driving simulations, and it demonstrates strong potential for broader downstream tasks through ablations and stylization experiments.

Abstract

Dynamic driving scene reconstruction is of great importance in fields like digital twin system and autonomous driving simulation. However, unacceptable degradation occurs when the view deviates from the input trajectory, leading to corrupted background and vehicle models. To improve reconstruction quality on novel trajectory, existing methods are subject to various limitations including inconsistency, deformation, and time consumption. This paper proposes LidarPainter, a one-step diffusion model that recovers consistent driving views from sparse LiDAR condition and artifact-corrupted renderings in real-time, enabling high-fidelity lane shifts in driving scene reconstruction. Extensive experiments show that LidarPainter outperforms state-of-the-art methods in speed, quality and resource efficiency, specifically 7 x faster than StreetCrafter with only one fifth of GPU memory required. LidarPainter also supports stylized generation using text prompts such as "foggy" and "night", allowing for a diverse expansion of the existing asset library.

Paper Structure

This paper contains 17 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of diffusion guidance sampled on 7000 iterations. Our method shows much better fidelity and consistency with clear characters on truck, while StreetCrafter generates corrupted vehicles and text.
  • Figure 2: LidarPainter Reconstruction Pipeline.
  • Figure 3: Qualitative comparisons of image generation and 3D scene reconstruction results on different scenes.
  • Figure 4: Ablation on Laten Attention Fusion (LAF).
  • Figure 5: Prompted generation of LidarPainter.