When Digital Twins Meet Large Language Models: Realistic, Interactive, and Editable Simulation for Autonomous Driving
Tanmay Vilas Samak, Chinmay Vilas Samak, Bing Li, Venkat Krovi
TL;DR
This framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments and incorporates a large language model interface to flexibly edit the driving scenarios online via natural language prompts, with ~85% generalizability and ~95% repeatability.
Abstract
Simulation frameworks have been key enablers for the development and validation of autonomous driving systems. However, existing methods struggle to comprehensively address the autonomy-oriented requirements of balancing: (i) dynamical fidelity, (ii) photorealistic rendering, (iii) context-relevant scenario orchestration, and (iv) real-time performance. To address these limitations, we present a unified framework for creating and curating high-fidelity digital twins to accelerate advancements in autonomous driving research. Our framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments. It is capable of reconstructing real-world scenes and assets with geometric and photorealistic accuracy (~97% structural similarity) and infusing them with physical properties to enable real-time (>60 Hz) dynamical simulation of the ensuing driving scenarios. Additionally, it incorporates a large language model (LLM) interface to flexibly edit the driving scenarios online via natural language prompts, with ~85% generalizability and ~95% repeatability. Finally, an optional vision language model (VLM) provides ~80% visual enhancement by blending the hybrid scene composition.
