Table of Contents
Fetching ...

Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous Driving

Zilin Huang, Zhengyang Wan, Zihao Sheng, Boyue Wang, Junwei You, Yue Leng, Sikai Chen

Abstract

Deploying reinforcement learning policies trained in simulation to real autonomous vehicles remains a fundamental challenge, particularly for VLM-guided RL frameworks whose policies are typically learned with simulator-native observations and simulator-coupled action semantics that are unavailable on physical platforms. This paper presents Sim2Real-AD, a modular framework for zero-shot sim-to-real transfer of CARLA-trained VLM-guided RL policies to full-scale vehicles without any real-world RL training data. The framework decomposes the transfer problem into four components: a Geometric Observation Bridge (GOB) that converts monocular front-view images into simulator-compatible bird's-eye-view (BEV) observations, a Physics-Aware Action Mapping (PAM) that translates policy outputs into platform-agnostic physical commands, a Two-Phase Progressive Training (TPT) strategy that stabilizes adaptation by separating action-space and observation-space transfer, and a Real-time Deployment Pipeline (RDP) that integrates perception, policy inference, control conversion, and safety monitoring for closed-loop execution. Simulation experiments show that the framework preserves the relative performance ordering of representative RL algorithms across different reward paradigms and validate the contribution of each module. Zero-shot deployment on a full-scale Ford E-Transit achieves success rates of 90%, 80%, and 75% in car-following, obstacle avoidance, and stop-sign interaction scenarios, respectively. To the best of our knowledge, this study is among the first to demonstrate zero-shot closed-loop deployment of a CARLA-trained VLM-guided RL policy on a full-scale real vehicle without any real-world RL training data. The demo video and code are available at: https://zilin-huang.github.io/Sim2Real-AD-website/.

Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous Driving

Abstract

Deploying reinforcement learning policies trained in simulation to real autonomous vehicles remains a fundamental challenge, particularly for VLM-guided RL frameworks whose policies are typically learned with simulator-native observations and simulator-coupled action semantics that are unavailable on physical platforms. This paper presents Sim2Real-AD, a modular framework for zero-shot sim-to-real transfer of CARLA-trained VLM-guided RL policies to full-scale vehicles without any real-world RL training data. The framework decomposes the transfer problem into four components: a Geometric Observation Bridge (GOB) that converts monocular front-view images into simulator-compatible bird's-eye-view (BEV) observations, a Physics-Aware Action Mapping (PAM) that translates policy outputs into platform-agnostic physical commands, a Two-Phase Progressive Training (TPT) strategy that stabilizes adaptation by separating action-space and observation-space transfer, and a Real-time Deployment Pipeline (RDP) that integrates perception, policy inference, control conversion, and safety monitoring for closed-loop execution. Simulation experiments show that the framework preserves the relative performance ordering of representative RL algorithms across different reward paradigms and validate the contribution of each module. Zero-shot deployment on a full-scale Ford E-Transit achieves success rates of 90%, 80%, and 75% in car-following, obstacle avoidance, and stop-sign interaction scenarios, respectively. To the best of our knowledge, this study is among the first to demonstrate zero-shot closed-loop deployment of a CARLA-trained VLM-guided RL policy on a full-scale real vehicle without any real-world RL training data. The demo video and code are available at: https://zilin-huang.github.io/Sim2Real-AD-website/.

Paper Structure

This paper contains 56 sections, 6 theorems, 46 equations, 21 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

Under bounded segmentation error $\epsilon_{\mathrm{seg}}$, bounded PID tracking error $\epsilon_{\mathrm{pid}}$, and bounded observation distribution gap $d_{\mathrm{TV}}(\mathcal{O}_2, \mathcal{O}^{\text{real}}) \leq \delta$, the expected cumulative reward of the Sim2Real-AD policy $\pi^{\text{rea where $C_1, C_2, C_3 > 0$ are constants depending on the policy's Lipschitz constant $L_\pi$, the r

Figures (21)

  • Figure 1: Overview of the sim-to-real challenge and the proposed Sim2Real-AD framework. Direct transfer fails because of the coupled observation and dynamics gaps, while Sim2Real-AD addresses them through GOB, PAM, TPT, and RDP.
  • Figure 2: Overview of Sim2Real-AD. The framework bridges sim-to-real transfer through four components: the Geometric Observation Bridge (GOB), the Physics-Aware Action Mapping (PAM), the Two-Phase Progressive Training strategy (TPT), and the Real-time Deployment Pipeline (RDP). Instantiated here with DriveVLM-RL as the backbone, the framework is reward-agnostic and is demonstrated across multiple RL reward paradigms.
  • Figure 3: Geometric Observation Bridge from monocular front-view images to simulator-compatible BEV observations. Phase 1 uses simulator-privileged GT-BEV, whereas Phase 2 and real-world deployment use GOB-generated BEV with the same tensor shape and semantic channel layout.
  • Figure 4: ChatScene training curves: Original (dashed, GT-BEV + Direct Action, $\pm 1\sigma$ over 3 seeds) vs. Sim2Real-AD (solid, Phase 1: GT-BEV + PAM, Phase 2: GOB-BEV + PAM). Vertical dotted line marks Phase 1$\to$2 at $1{\times}10^6$ steps. (a) Collision rate. (b) Average speed. (c) Total distance. (d) Routes completed.
  • Figure 5: VLM-RL training curves: Original (dashed, GT-BEV + Direct Action, $\pm 1\sigma$ over 3 seeds) vs. Sim2Real-AD (solid, Phase 1: GT-BEV + PAM, Phase 2: GOB-BEV + PAM). Vertical dotted line marks Phase 1$\to$2 at $1{\times}10^6$ steps. (a) Collision rate. (b) Average speed. (c) Total distance. (d) Routes completed.
  • ...and 16 more figures

Theorems & Definitions (27)

  • Definition 1: Domain-Invariant BEV Representation
  • Remark 1
  • Definition 2: Platform-Agnostic Action Interface
  • Remark 2: Why platform-agnostic actions improve transfer
  • Definition 3: Progressive Observation Curriculum
  • Remark 3: Motivation for progressive training
  • Theorem 1: Zero-Shot Transfer Guarantee
  • Remark 4
  • Remark 1
  • Remark 2
  • ...and 17 more