Table of Contents
Fetching ...

GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio

Hao Zhang, Lue Fan, Qitai Wang, Wenbo Li, Zehuan Wu, Lewei Lu, Zhaoxiang Zhang, Hongsheng Li

TL;DR

GA-Drive, a novel simulation framework capable of generating camera views along user-specified novel trajectories through Geometry-Appearance Decoupling and Diffusion-Based Generation, substantially outperforms existing methods in terms of NTA-IoU, NTL-IoU, and FID scores.

Abstract

A free-viewpoint, editable, and high-fidelity driving simulator is crucial for training and evaluating end-to-end autonomous driving systems. In this paper, we present GA-Drive, a novel simulation framework capable of generating camera views along user-specified novel trajectories through Geometry-Appearance Decoupling and Diffusion-Based Generation. Given a set of images captured along a recorded trajectory and the corresponding scene geometry, GA-Drive synthesizes novel pseudo-views using geometry information. These pseudo-views are then transformed into photorealistic views using a trained video diffusion model. In this way, we decouple the geometry and appearance of scenes. An advantage of such decoupling is its support for appearance editing via state-of-the-art video-to-video editing techniques, while preserving the underlying geometry, enabling consistent edits across both original and novel trajectories. Extensive experiments demonstrate that GA-Drive substantially outperforms existing methods in terms of NTA-IoU, NTL-IoU, and FID scores.

GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio

TL;DR

GA-Drive, a novel simulation framework capable of generating camera views along user-specified novel trajectories through Geometry-Appearance Decoupling and Diffusion-Based Generation, substantially outperforms existing methods in terms of NTA-IoU, NTL-IoU, and FID scores.

Abstract

A free-viewpoint, editable, and high-fidelity driving simulator is crucial for training and evaluating end-to-end autonomous driving systems. In this paper, we present GA-Drive, a novel simulation framework capable of generating camera views along user-specified novel trajectories through Geometry-Appearance Decoupling and Diffusion-Based Generation. Given a set of images captured along a recorded trajectory and the corresponding scene geometry, GA-Drive synthesizes novel pseudo-views using geometry information. These pseudo-views are then transformed into photorealistic views using a trained video diffusion model. In this way, we decouple the geometry and appearance of scenes. An advantage of such decoupling is its support for appearance editing via state-of-the-art video-to-video editing techniques, while preserving the underlying geometry, enabling consistent edits across both original and novel trajectories. Extensive experiments demonstrate that GA-Drive substantially outperforms existing methods in terms of NTA-IoU, NTL-IoU, and FID scores.
Paper Structure (23 sections, 3 equations, 8 figures, 2 tables)

This paper contains 23 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: GA-Drive generates high-quality and novel views along shifted trajectories. By decoupling geometry from appearance, our method allows flexible editing of appearance, such as altering the car’s appearance and changing the weather to fog, without impacting the underlying geometry. The red bounding boxes show that appearance editing is consistent across different trajectories.
  • Figure 2: The overview framework of GA-Drive. Pseudo views are created by casting rays from novel poses into 3D space with rendered depth maps to generate a 3D point cloud, which is then projected onto recorded views to sample color information. These pseudo views are subsequently transformed into photorealistic frames using our trained video diffusion model (referred to as Generation). The black curved line indicates the recorded camera trajectory. By iteratively applying the video diffusion model to each segment, our method can generate photorealistic novel views of unlimited length.
  • Figure 3: (a) illustrates the 2D version of the visibility check. Since the rendered depth $<$ the point depth, the point is occluded by a surface and should be masked. (b) shows an incorrect pseudo view without applying the visibility check, while (c) presents the correct result with visibility properly handled.
  • Figure 4: The architecture of our video diffusion model. Our segment-wise video diffusion model generates videos conditioned on the novel pseudo-views.
  • Figure 5: The pseudo-view simulation pipeline. We design this pseudo-view simulation pipeline to simulate the characteristic patterns of the novel pseudo-views.
  • ...and 3 more figures