Table of Contents
Fetching ...

A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

Longchao Da, Justin Turnau, Thirulogasankar Pranav Kutralingam, Alvaro Velasquez, Paulo Shakarian, Hua Wei

TL;DR

The paper addresses the sim-to-real challenge in RL by proposing a formal, MDP-based taxonomy that spans observation, action, transition, and reward dimensions, and by surveying both classical techniques and emerging foundation-model–driven methods. It synthesizes domain-specific insights, benchmarks, and evaluation protocols, while highlighting GenAI-based simulation trends and a publicly maintained research repository. Key contributions include a rigorous taxonomy, a comprehensive literature review across domains, and a discussion of evaluation settings and metrics to quantify transfer gaps. The work aims to unify disparate strands of sim-to-real research, guiding future development toward safer, more scalable deployment of RL in real-world systems.

Abstract

Deep Reinforcement Learning (RL) has been explored and verified to be effective in solving decision-making tasks in various domains, such as robotics, transportation, recommender systems, etc. It learns from the interaction with environments and updates the policy using the collected experience. However, due to the limited real-world data and unbearable consequences of taking detrimental actions, the learning of RL policy is mainly restricted within the simulators. This practice guarantees safety in learning but introduces an inevitable sim-to-real gap in terms of deployment, thus causing degraded performance and risks in execution. There are attempts to solve the sim-to-real problems from different domains with various techniques, especially in the era with emerging techniques such as large foundations or language models that have cast light on the sim-to-real. This survey paper, to the best of our knowledge, is the first taxonomy that formally frames the sim-to-real techniques from key elements of the Markov Decision Process (State, Action, Transition, and Reward). Based on the framework, we cover comprehensive literature from the classic to the most advanced methods including the sim-to-real techniques empowered by foundation models, and we also discuss the specialties that are worth attention in different domains of sim-to-real problems. Then we summarize the formal evaluation process of sim-to-real performance with accessible code or benchmarks. The challenges and opportunities are also presented to encourage future exploration of this direction. We are actively maintaining a repository to include the most up-to-date sim-to-real research work to help domain researchers.

A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

TL;DR

The paper addresses the sim-to-real challenge in RL by proposing a formal, MDP-based taxonomy that spans observation, action, transition, and reward dimensions, and by surveying both classical techniques and emerging foundation-model–driven methods. It synthesizes domain-specific insights, benchmarks, and evaluation protocols, while highlighting GenAI-based simulation trends and a publicly maintained research repository. Key contributions include a rigorous taxonomy, a comprehensive literature review across domains, and a discussion of evaluation settings and metrics to quantify transfer gaps. The work aims to unify disparate strands of sim-to-real research, guiding future development toward safer, more scalable deployment of RL in real-world systems.

Abstract

Deep Reinforcement Learning (RL) has been explored and verified to be effective in solving decision-making tasks in various domains, such as robotics, transportation, recommender systems, etc. It learns from the interaction with environments and updates the policy using the collected experience. However, due to the limited real-world data and unbearable consequences of taking detrimental actions, the learning of RL policy is mainly restricted within the simulators. This practice guarantees safety in learning but introduces an inevitable sim-to-real gap in terms of deployment, thus causing degraded performance and risks in execution. There are attempts to solve the sim-to-real problems from different domains with various techniques, especially in the era with emerging techniques such as large foundations or language models that have cast light on the sim-to-real. This survey paper, to the best of our knowledge, is the first taxonomy that formally frames the sim-to-real techniques from key elements of the Markov Decision Process (State, Action, Transition, and Reward). Based on the framework, we cover comprehensive literature from the classic to the most advanced methods including the sim-to-real techniques empowered by foundation models, and we also discuss the specialties that are worth attention in different domains of sim-to-real problems. Then we summarize the formal evaluation process of sim-to-real performance with accessible code or benchmarks. The challenges and opportunities are also presented to encourage future exploration of this direction. We are actively maintaining a repository to include the most up-to-date sim-to-real research work to help domain researchers.

Paper Structure

This paper contains 44 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Taxonomy of research on Sim-to-Real in RL that consists of the Issues, Techniques, Domain Discussion, and Evaluations.
  • Figure 2: The overview of Sim-to-Real issue causes. Four key sim-to-real (Sim2Real) gaps in RL arise from discrepancies between the simulated environment (Env-Sim) and the real-world environment (Env-Real). The Action Gap ($a_t^{real} \neq a_t^{sim}$) originates from differences in system's mechanical state $\Delta_{system}$ or action space granularity $\Delta_{\mathcal{A}}$. The Reward Gap ($r_t^{real} \neq r_t^{sim}$) arises due to mismatches in the reward function between systems, and also the granularity of actions $\Delta_{\mathcal{A}}$. The Next State Gap ($s_{t+1}^{real} \neq s_{t+1}^{sim}$) reflects inaccuracies in the transition dynamics of the simulated environment $P_s(\cdot \mid s_t, a_t)$ compared to the real-world dynamics $P_r(\cdot \mid s_t, a_t)$. Lastly, the Observation Gap ($o_t^{real} \neq o_t^{sim}$) is from incomplete perception modules $\Delta_{perception}$ or the representations mismatch $\Delta_{\mathcal{S}}$. These collectively define the Sim-to-Real challenge in RL.
  • Figure 3: The four major types of the Sim-to-Real methods in Observation aspect using example from mozifian2020intervention. Domain Randomization enhances policy robustness by introducing a wide range of variations in simulated environments, enabling agents to generalize effectively to diverse real-world scenarios tobin2017domain. Domain Adaptation bridges the gap between simulated and real domains by aligning feature distributions, ensuring that policies trained in simulation perform consistently in real environments tzeng2017adversarial. Sensor Fusion integrates data from multiple sensors to provide comprehensive and reliable environmental perception bohez2017sensor, thereby compensating for the limitations of individual sensors, multiple observations provide a better grounding on the perception, thus mitigating the Sim-to-Real issues. Lastly, Foundation Models increases the world depiction by leveraging the VLM to provide further task-level descriptions and encode such semantics info. into the agents' observations yu2024natural.
  • Figure 4: The taxonomy of action-related methods in sim2real RL.
  • Figure 5: In dalal2024local, a method is presented that uses the zero-shot capabilities of Vision Language Models (VLMs) to perform long-horizon manipulation tasks. Local policies are trained in the $E_{sim}$, while task execution occurs in the $E_{real}$, with the VLM coordinating the actions within the motion plans to achieve the task.
  • ...and 1 more figures