MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation

Yilin Mei; Peng Qiu; Wei Zhang; WenChao Zhang; Wenjie Song

MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation

Yilin Mei, Peng Qiu, Wei Zhang, WenChao Zhang, Wenjie Song

TL;DR

MobRT presents a digital-twin framework to scale data generation for mobile manipulation, enabling coherent whole-body interactions with articulated objects and mobile-base tasks. It combines Virtual Kinematic Chains, whole-body motion planning, and a Transformer-based diffusion policy trained with Flow Matching, augmented by real-world demonstrations to improve sim-to-real transfer. A comprehensive MobRT benchmark validates data quality and reveals that additional generated trajectories consistently boost policy success, with the proposed method outperforming strong baselines, especially in data-scarce settings. Mixed data co-training further enhances real-world robustness, underscoring MobRT’s practical impact for mobile manipulation in unstructured environments.

Abstract

Recent advances in robotics have been largely driven by imitation learning, which depends critically on large-scale, high-quality demonstration data. However, collecting such data remains a significant challenge-particularly for mobile manipulators, which must coordinate base locomotion and arm manipulation in high-dimensional, dynamic, and partially observable environments. Consequently, most existing research remains focused on simpler tabletop scenarios, leaving mobile manipulation relatively underexplored. To bridge this gap, we present \textit{MobRT}, a digital twin-based framework designed to simulate two primary categories of complex, whole-body tasks: interaction with articulated objects (e.g., opening doors and drawers) and mobile-base pick-and-place operations. \textit{MobRT} autonomously generates diverse and realistic demonstrations through the integration of virtual kinematic control and whole-body motion planning, enabling coherent and physically consistent execution. We evaluate the quality of \textit{MobRT}-generated data across multiple baseline algorithms, establishing a comprehensive benchmark and demonstrating a strong correlation between task success and the number of generated trajectories. Experiments integrating both simulated and real-world demonstrations confirm that our approach markedly improves policy generalization and performance, achieving robust results in both simulated and real-world environments.

MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation

TL;DR

Abstract

MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)