Table of Contents
Fetching ...

XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots

Tianyi Wang, Jiseop Byeon, Ahmad Yehia, Huihai Wang, Yiming Xu, Tianyi Zeng, Ziran Wang, Junfeng Jiao, Christian Claudel

TL;DR

XR-DT presents a hierarchical Extended Reality-Enhanced Digital Twin to advance safe, interpretable HRI for agentic mobile robots. By integrating VR for simulation, AR for real-world overlays, and MR as the integration layer, it leverages chain-of-thought prompting and a unified diffusion policy within a two-agent AutoGen MAS to coordinate planning and execution. The framework is validated through human and robot trajectory-prediction experiments, showing improved accuracy when incorporating egocentric cues and robust multi-modal reasoning. These results demonstrate XR-DT’s potential to enable trustworthy, adaptive HRI in dynamic, shared environments.

Abstract

As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.

XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots

TL;DR

XR-DT presents a hierarchical Extended Reality-Enhanced Digital Twin to advance safe, interpretable HRI for agentic mobile robots. By integrating VR for simulation, AR for real-world overlays, and MR as the integration layer, it leverages chain-of-thought prompting and a unified diffusion policy within a two-agent AutoGen MAS to coordinate planning and execution. The framework is validated through human and robot trajectory-prediction experiments, showing improved accuracy when incorporating egocentric cues and robust multi-modal reasoning. These results demonstrate XR-DT’s potential to enable trustworthy, adaptive HRI in dynamic, shared environments.

Abstract

As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.

Paper Structure

This paper contains 26 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: Extend reality-enhanced digital twin framework.
  • Figure 2: Workflow of human-robot interaction.
  • Figure 3: Operating system overview.
  • Figure 4: Total-path human trajectory prediction.
  • Figure 5: 4-second human and robot trajectory prediction.