Table of Contents
Fetching ...

A Digital Twin Framework for Reinforcement Learning with Real-Time Self-Improvement via Human Assistive Teleoperation

Kabirat Olayemi, Mien Van, Luke Maguire, Sean McLoone

TL;DR

This work introduces a digital-twin reinforcement learning framework for UGV navigation that enables real-time self-improvement after deployment via human-assisted teleoperation. A three-phase approach combines simulation-based pre-training, a digital twin that mirrors the real robot for safe testing, and a retraining phase augmented by minimal human input to mitigate catastrophic forgetting. The method uses TD3 with a priority replay buffer and a TD3 Digital Twin Retraining extension to bridge sim-to-real gaps and accelerate adaptation in dynamic environments. Experimental results in both simulation and real-world tests show improved reward accumulation, higher success rates, and reduced collisions compared with TD3 and TD3DT baselines, highlighting the practical impact for robust autonomous navigation.

Abstract

Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) is a powerful approach to solving Markov Decision Processes (MDPs) when the model of the environment is not known a priori. However, RL models are still faced with challenges such as handling covariate shifts and ensuring the quality of human demonstration. To address these challenges and further advance DRL models, our work develops a human-in-the-loop DRL framework via digital twin that leverages human intelligence after deployment to retrain the DRL model in real time. First, we develop a pre-trained model fully based on learning through trial and error in the simulated environment allowing scalability and automation while eliminating variability and biases that can come from subjective human guidance. Second, instead of deploying the trained model directly on the UGV, we create a digital twin which controls the physical UGV from the virtual environment. Third, to allow continuous learning without catastrophic forgetting, we introduce the ability of the model to self-improve with the help of small human guidance at the start of the retraining. We test the performance of our proposed model in both simulation and real-world environments with both static and dynamic obstacles. The results indicate that our proposed approach not only outperforms the baseline models in terms of reward accumulation but also demonstrates superior training efficiency.

A Digital Twin Framework for Reinforcement Learning with Real-Time Self-Improvement via Human Assistive Teleoperation

TL;DR

This work introduces a digital-twin reinforcement learning framework for UGV navigation that enables real-time self-improvement after deployment via human-assisted teleoperation. A three-phase approach combines simulation-based pre-training, a digital twin that mirrors the real robot for safe testing, and a retraining phase augmented by minimal human input to mitigate catastrophic forgetting. The method uses TD3 with a priority replay buffer and a TD3 Digital Twin Retraining extension to bridge sim-to-real gaps and accelerate adaptation in dynamic environments. Experimental results in both simulation and real-world tests show improved reward accumulation, higher success rates, and reduced collisions compared with TD3 and TD3DT baselines, highlighting the practical impact for robust autonomous navigation.

Abstract

Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) is a powerful approach to solving Markov Decision Processes (MDPs) when the model of the environment is not known a priori. However, RL models are still faced with challenges such as handling covariate shifts and ensuring the quality of human demonstration. To address these challenges and further advance DRL models, our work develops a human-in-the-loop DRL framework via digital twin that leverages human intelligence after deployment to retrain the DRL model in real time. First, we develop a pre-trained model fully based on learning through trial and error in the simulated environment allowing scalability and automation while eliminating variability and biases that can come from subjective human guidance. Second, instead of deploying the trained model directly on the UGV, we create a digital twin which controls the physical UGV from the virtual environment. Third, to allow continuous learning without catastrophic forgetting, we introduce the ability of the model to self-improve with the help of small human guidance at the start of the retraining. We test the performance of our proposed model in both simulation and real-world environments with both static and dynamic obstacles. The results indicate that our proposed approach not only outperforms the baseline models in terms of reward accumulation but also demonstrates superior training efficiency.
Paper Structure (23 sections, 6 equations, 8 figures, 3 tables, 4 algorithms)

This paper contains 23 sections, 6 equations, 8 figures, 3 tables, 4 algorithms.

Figures (8)

  • Figure 1: We present the overall framework of pur proposed methodology. In phase 1, we pre-trained the TD3 agent in simulation storing experiences of simulation in both replay and priority buffers. The deployment integrates both phases 2 and 3. Phase 2 presents the creation of the digital twin and phase 3 controls the operation of the whole system switching between the AI model or human expert guidaiance.
  • Figure 2: Husky A200 UGV is a differential-drive implemented on Ubuntu 20.04 operating system and supported by Robot Operating System (ROS).
  • Figure 3: At the start of each episode during training in the simulation, the UGV is randomly spawned at different locations maintaining a safe distance between both the goal(green star) and and the obstacles. The walls in the world are static throughout the training while other obstacles i.e. the box, cylinder and sphere are also randomly positioned at the start of a new episode.
  • Figure 4: Examples of environment scenes used for testing the model (a) In this environment scene, all objects are static with a minimum distance of over $2 m$ between the robot and the obstacles. (b) In this environment, we moved one of the obstacles closer to the robot maintaining just $1.1 m$ distance between the robot and the obstacle.
  • Figure 5: This figure shows some of the setups of the environment used for training our model. The setup (a) consists of sparely distributed static objects. (b) is showing the twin of (a) in simulation. Similarly, (c) is another environment setup with densely distributed obstacles and (d) is its representative twin in simulation. The green boxes represent the objects in the physical world that are above $1.5 m$ away from the robot while the red boxes are obstacles below $1.5m$ way to the robot. This helps to easily track impending obstacles.
  • ...and 3 more figures