Table of Contents
Fetching ...

UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation

Yichong Lu, Yichi Cai, Shangzhan Zhang, Hongyu Zhou, Haoji Hu, Huimin Yu, Andreas Geiger, Yiyi Liao

TL;DR

UrbanCAD tackles the challenge of producing photorealistic and highly controllable 3D vehicle digital twins from a single urban image. It achieves this with a retrieval-optimization pipeline that selects CAD geometries and material graphs from large free-model libraries, followed by part-aware material optimization via differentiable rendering. The framework also addresses realistic scene insertion by estimating spatially varying lighting from fisheye views and reconstructing backgrounds with 3D Gaussian Splatting, enabling novel-view city scenes and safe testing of perception systems. Experiments show superior photorealism (FID/KID/LPIPS) and demonstrate the value of high controllability for generating safety-critical OOD driving scenarios, while revealing limitations in geometry alignment and lighting fidelity for distant insertions.

Abstract

Photorealistic 3D vehicle models with high controllability are essential for autonomous driving simulation and data augmentation. While handcrafted CAD models provide flexible controllability, free CAD libraries often lack the high-quality materials necessary for photorealistic rendering. Conversely, reconstructed 3D models offer high-fidelity rendering but lack controllability. In this work, we introduce UrbanCAD, a framework that generates highly controllable and photorealistic 3D vehicle digital twins from a single urban image, leveraging a large collection of free 3D CAD models and handcrafted materials. To achieve this, we propose a novel pipeline that follows a retrieval-optimization manner, adapting to observational data while preserving fine-grained expert-designed priors for both geometry and material. This enables vehicles' realistic 360-degree rendering, background insertion, material transfer, relighting, and component manipulation. Furthermore, given multi-view background perspective and fisheye images, we approximate environment lighting using fisheye images and reconstruct the background with 3DGS, enabling the photorealistic insertion of optimized CAD models into rendered novel view backgrounds. Experimental results demonstrate that UrbanCAD outperforms baselines in terms of photorealism. Additionally, we show that various perception models maintain their accuracy when evaluated on UrbanCAD with in-distribution configurations but degrade when applied to realistic out-of-distribution data generated by our method. This suggests that UrbanCAD is a significant advancement in creating photorealistic, safety-critical driving scenarios for downstream applications.

UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation

TL;DR

UrbanCAD tackles the challenge of producing photorealistic and highly controllable 3D vehicle digital twins from a single urban image. It achieves this with a retrieval-optimization pipeline that selects CAD geometries and material graphs from large free-model libraries, followed by part-aware material optimization via differentiable rendering. The framework also addresses realistic scene insertion by estimating spatially varying lighting from fisheye views and reconstructing backgrounds with 3D Gaussian Splatting, enabling novel-view city scenes and safe testing of perception systems. Experiments show superior photorealism (FID/KID/LPIPS) and demonstrate the value of high controllability for generating safety-critical OOD driving scenarios, while revealing limitations in geometry alignment and lighting fidelity for distant insertions.

Abstract

Photorealistic 3D vehicle models with high controllability are essential for autonomous driving simulation and data augmentation. While handcrafted CAD models provide flexible controllability, free CAD libraries often lack the high-quality materials necessary for photorealistic rendering. Conversely, reconstructed 3D models offer high-fidelity rendering but lack controllability. In this work, we introduce UrbanCAD, a framework that generates highly controllable and photorealistic 3D vehicle digital twins from a single urban image, leveraging a large collection of free 3D CAD models and handcrafted materials. To achieve this, we propose a novel pipeline that follows a retrieval-optimization manner, adapting to observational data while preserving fine-grained expert-designed priors for both geometry and material. This enables vehicles' realistic 360-degree rendering, background insertion, material transfer, relighting, and component manipulation. Furthermore, given multi-view background perspective and fisheye images, we approximate environment lighting using fisheye images and reconstruct the background with 3DGS, enabling the photorealistic insertion of optimized CAD models into rendered novel view backgrounds. Experimental results demonstrate that UrbanCAD outperforms baselines in terms of photorealism. Additionally, we show that various perception models maintain their accuracy when evaluated on UrbanCAD with in-distribution configurations but degrade when applied to realistic out-of-distribution data generated by our method. This suggests that UrbanCAD is a significant advancement in creating photorealistic, safety-critical driving scenarios for downstream applications.

Paper Structure

This paper contains 38 sections, 5 equations, 22 figures, 6 tables.

Figures (22)

  • Figure 1: Overview of UrbanCAD. Given a single view input image, we first perform CAD model retrieval and retrieval-based material optimization to create photorealistic and highly controllable vehicle digital twins (left). Given multi-view background images, we then perform realistic vehicle insertion to create various synthetic data for self-driving system testing (right).
  • Figure 2: Window recognition results on colored material design, retrieved CAD rendering, and augmented data by ControlNet Zhang2023AddingCC.
  • Figure 3: Qualitative results on KITTI-360 for novel view synthesis from reference (Ref.) and rotated (rot.) viewpoints. UrbanCAD produces more robust and realistic results at the novel viewpoint compared to the baselines.
  • Figure 4: More pairs of 3D vehicles after CAD model retrieval and material optimization (right) alongside the input single-view segmented vehicles (left). UrbanCAD produces photorealistic 3D vehicles with different categories given single-view inputs.
  • Figure 5: Recovery from partial observation.
  • ...and 17 more figures