Boosting Visual Fidelity in Driving Simulations through Diffusion Models
Fanjun Bu, Hiroshi Yasuda
TL;DR
The paper addresses the challenge of low physical validity in driving simulations due to limited visual fidelity. It introduces DRIVE, a diffusion-model-based pipeline that performs real-time style transfer to render photorealistic views, leveraging ControlNet conditioning, LoRa adapters, Latent Consistency Models, and ROS 2 for cross-machine deployment. The authors present a detailed system design focusing on acceleration, within-frame and cross-frame consistency, and a preliminary user study indicating improved visual realism with DRIVE, though task performance evidence remains inconclusive. The work demonstrates a scalable, data-driven path to enhance realism in driving simulations and outlines practical guidelines and future directions, including image harmonization in mixed reality and interactive conditioning with human feedback.
Abstract
Diffusion models have made substantial progress in facilitating image generation and editing. As the technology matures, we see its potential in the context of driving simulations to enhance the simulated experience. In this paper, we explore this potential through the introduction of a novel system designed to boost visual fidelity. Our system, DRIVE (Diffusion-based Realism Improvement for Virtual Environments), leverages a diffusion model pipeline to give a simulated environment a photorealistic view, with the flexibility to be adapted for other applications. We conducted a preliminary user study to assess the system's effectiveness in rendering realistic visuals and supporting participants in performing driving tasks. Our work not only lays the groundwork for future research on the integration of diffusion models in driving simulations but also provides practical guidelines and best practices for their application in this context.
