Table of Contents
Fetching ...

When Digital Twins Meet Large Language Models: Realistic, Interactive, and Editable Simulation for Autonomous Driving

Tanmay Vilas Samak, Chinmay Vilas Samak, Bing Li, Venkat Krovi

TL;DR

This framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments and incorporates a large language model interface to flexibly edit the driving scenarios online via natural language prompts, with ~85% generalizability and ~95% repeatability.

Abstract

Simulation frameworks have been key enablers for the development and validation of autonomous driving systems. However, existing methods struggle to comprehensively address the autonomy-oriented requirements of balancing: (i) dynamical fidelity, (ii) photorealistic rendering, (iii) context-relevant scenario orchestration, and (iv) real-time performance. To address these limitations, we present a unified framework for creating and curating high-fidelity digital twins to accelerate advancements in autonomous driving research. Our framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments. It is capable of reconstructing real-world scenes and assets with geometric and photorealistic accuracy (~97% structural similarity) and infusing them with physical properties to enable real-time (>60 Hz) dynamical simulation of the ensuing driving scenarios. Additionally, it incorporates a large language model (LLM) interface to flexibly edit the driving scenarios online via natural language prompts, with ~85% generalizability and ~95% repeatability. Finally, an optional vision language model (VLM) provides ~80% visual enhancement by blending the hybrid scene composition.

When Digital Twins Meet Large Language Models: Realistic, Interactive, and Editable Simulation for Autonomous Driving

TL;DR

This framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments and incorporates a large language model interface to flexibly edit the driving scenarios online via natural language prompts, with ~85% generalizability and ~95% repeatability.

Abstract

Simulation frameworks have been key enablers for the development and validation of autonomous driving systems. However, existing methods struggle to comprehensively address the autonomy-oriented requirements of balancing: (i) dynamical fidelity, (ii) photorealistic rendering, (iii) context-relevant scenario orchestration, and (iv) real-time performance. To address these limitations, we present a unified framework for creating and curating high-fidelity digital twins to accelerate advancements in autonomous driving research. Our framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments. It is capable of reconstructing real-world scenes and assets with geometric and photorealistic accuracy (~97% structural similarity) and infusing them with physical properties to enable real-time (>60 Hz) dynamical simulation of the ensuing driving scenarios. Additionally, it incorporates a large language model (LLM) interface to flexibly edit the driving scenarios online via natural language prompts, with ~85% generalizability and ~95% repeatability. Finally, an optional vision language model (VLM) provides ~80% visual enhancement by blending the hybrid scene composition.

Paper Structure

This paper contains 12 sections, 3 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Proposed framework for curating autonomy-oriented digital twins, which blends photorealism and physical interaction with LLM-driven scenario orchestration for enhanced serviceability. Video:https://youtu.be/3ovFiRgbHFc
  • Figure 2: Proposed approach to (a) photorealistic and geometrically accurate 3D scene reconstruction, (b) hybrid scene composition of physics-based and data-driven digital twins for physically interactive and graphically realistic simulation, and (c) LLM-guided context-aware scenario reconfiguration with optional VLM-guided visual enhancement.
  • Figure 3: High-fidelity reconstruction: Digital twin of the CU-ICAR campus showing (a) photorealistic rendering via 3DGS; (b) geometric accuracy w.r.t. 3D LIDAR point cloud data; (c) dynamical interaction via PSR; (d) visualization of reconstructed depth channel; (e) co-existence of 3DGS assets like a passenger car, a traffic cone, and a pedestrian sign with 3DMM assets like a road barrier, cement rubble, and a pedestrian; (f) real-time autonomy-oriented simulation with sensor visualization.
  • Figure 4: Samples of reconstructed scenes (a) CU-ICAR, (b) CGEC, (c) AuE lab, (d) lit footpath, (e) unlit footpath; and assets (f) passenger car, (g) traffic cone, (h) pedestrian sign. Notice the shadows and reflections captured from real-world data.
  • Figure 5: Worked example of scenario reconfiguration: (a) User prompt "make it rain" being handled by level-1 LLM agent, (b) scenario design requirements being parsed by level-2 LLM agent, and (c) specialized tasks being implemented by rule-based agent to modify the existing scene.
  • ...and 2 more figures