From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

A. Humnabadkar; A. Sikdar; B. Cave; H. Zhang; N. Bessis; A. Behera

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

A. Humnabadkar, A. Sikdar, B. Cave, H. Zhang, N. Bessis, A. Behera

Abstract

Autonomous driving technologies have achieved significant advances in recent years, yet their real-world deployment remains constrained by data scarcity, safety requirements, and the need for generalization across diverse environments. In response, synthetic data and virtual environments have emerged as powerful enablers, offering scalable, controllable, and richly annotated scenarios for training and evaluation. This survey presents a comprehensive review of recent developments at the intersection of autonomous driving, simulation technologies, and synthetic datasets. We organize the landscape across three core dimensions: (i) the use of synthetic data for perception and planning, (ii) digital twin-based simulation for system validation, and (iii) domain adaptation strategies bridging synthetic and real-world data. We also highlight the role of vision-language models and simulation realism in enhancing scene understanding and generalization. A detailed taxonomy of datasets, tools, and simulation platforms is provided, alongside an analysis of trends in benchmark design. Finally, we discuss critical challenges and open research directions, including Sim2Real transfer, scalable safety validation, cooperative autonomy, and simulation-driven policy learning, that must be addressed to accelerate the path toward safe, generalizable, and globally deployable autonomous driving systems.

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

Abstract

Paper Structure (24 sections, 1 equation, 6 figures, 7 tables)

This paper contains 24 sections, 1 equation, 6 figures, 7 tables.

Introduction
Vision-Language Models (VLMs) for Contextual Scene Understanding
Annotation-Based Taxonomy of Autonomous Driving Datasets
Datasets with 2D Bounding Box Annotations
Datasets with Segmentation Annotations
Datasets with 3D Bounding Box Annotations
Simulation Technologies: Bridging Theory and Practice
The Role of Sim2Real and Real2Sim in Generating Complex Driving Scenarios
Addressing the Domain Gap Challenge
Generating Robustness in Real2Sim and Sim2Real
Digital Twins: A Comprehensive Approach to AV Testing
Scenario-Based Testing with Digital Twins
Digital Twins and Domain Gap Mitigation
Neural Reconstruction for AV Validation
Critical Comparison and Trade-off Analysis
...and 9 more sections

Figures (6)

Figure 1: Illustration of a traditional AV perception-control pipeline. Sensor inputs (cameras, radar, LiDAR, GPS) capture environmental data, which is filtered and processed in a preprocessing stage. The refined data feeds into deep learning models for object detection and prediction. The outputs then guide high-level decision-making modules for path planning and behavior prediction, which are executed by the control system. A feedback loop continuously updates the system for real-time adjustments.
Figure 2: A unified framework for driving scene understanding where several critical components, namely, sensor fusion, object recognition, semantic segmentation, motion prediction, mapping, and localization – merge synergistically to enable robust and safe driving. Each component addresses specific perception, prediction, or decision-making challenges. Together, they allow the vehicle to navigate dynamic environments with precision and reliability.
Figure 3: Showcasing a DriveGPTdrivegpt4 workflow in which natural language inputs from users are interpreted by LLM agents to allocate 3D models, gather relevant contextual elements like the surroundings and movement dynamics, and execute rendering functions. It also includes view modification and background creation, ultimately producing a photorealistic driving scene with dynamic simulation. This workflow demonstrates the capability of integrating advanced language models with simulation tools to enable precise and flexible autonomous vehicle testing and scenario visualization, bridging the gap between human intent and machine-generated environments.
Figure 4: Illustration of the integration of static, dynamic, and external factors into a unified framework for driving scene understanding. Together, they inform perception tasks, guide decision-making processes, and enable domain adaptation strategies to handle the complexities of real-world environments. By simultaneously considering these three aspects, AVs can gain a comprehensive insight into their environment, leading to safer and more efficient navigation.
Figure 5: The Venn diagram shows the distribution of recent driving datasets across annotation types . About 50–60% of datasets provide only 2D bounding boxes; 20–30% include 3D bounding boxes along with other annotations; nearly 20% include segmentation masks (again typically alongside other modalities). Only a small fraction (5–10%) of datasets contain all three annotation types. Moreover, accounting for static, dynamic, and external factors makes data collection and labeling even more challenging. This underscores the need to supplement real datasets with simulation-based pipelines to cover rare or complex scenarios.
...and 1 more figures

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

Abstract

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

Authors

Abstract

Table of Contents

Figures (6)