Table of Contents
Fetching ...

Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost

Daniel Honerkamp, Harsh Mahesheka, Jan Ole von Hartz, Tim Welschehold, Abhinav Valada

TL;DR

MoMa-Teleop introduces a zero-added-cost, whole-body teleoperation framework for mobile manipulators by letting humans control only the end-effector with standard interfaces, while a pretrained base agent handles base and torso motions to realize feasible, collision-avoiding, whole-body trajectories. The end-effector signals are inferred into m$_{ee}$ trajectories and fed to an N$^2$M$^2$ base policy, enabling smooth, anticipatory movement and robust operation in cluttered environments. The approach yields faster completion times and higher data quality for imitation learning, demonstrated across multiple robots and tasks, and supports rapid learning of new skills from as few as five demonstrations using TAPAS-GMM. The work also provides a thorough analysis of data quality and generalization, showing ee-only policies generalize better to new contexts than joint base–end-effector models, and it makes code publicly available for community use and large-scale data collection.

Abstract

Demonstration data plays a key role in learning complex behaviors and training robotic foundation models. While effective control interfaces exist for static manipulators, data collection remains cumbersome and time intensive for mobile manipulators due to their large number of degrees of freedom. While specialized hardware, avatars, or motion tracking can enable whole-body control, these approaches are either expensive, robot-specific, or suffer from the embodiment mismatch between robot and human demonstrator. In this work, we present MoMa-Teleop, a novel teleoperation method that infers end-effector motions from existing interfaces and delegates the base motions to a previously developed reinforcement learning agent, leaving the operator to focus fully on the task-relevant end-effector motions. This enables whole-body teleoperation of mobile manipulators with no additional hardware or setup costs via standard interfaces such as joysticks or hand guidance. Moreover, the operator is not bound to a tracked workspace and can move freely with the robot over spatially extended tasks. We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks. As the generated data covers diverse whole-body motions without embodiment mismatch, it enables efficient imitation learning. By focusing on task-specific end-effector motions, our approach learns skills that transfer to unseen settings, such as new obstacles or changed object positions, from as little as five demonstrations. We make code and videos available at https://moma-teleop.cs.uni-freiburg.de.

Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost

TL;DR

MoMa-Teleop introduces a zero-added-cost, whole-body teleoperation framework for mobile manipulators by letting humans control only the end-effector with standard interfaces, while a pretrained base agent handles base and torso motions to realize feasible, collision-avoiding, whole-body trajectories. The end-effector signals are inferred into m trajectories and fed to an NM base policy, enabling smooth, anticipatory movement and robust operation in cluttered environments. The approach yields faster completion times and higher data quality for imitation learning, demonstrated across multiple robots and tasks, and supports rapid learning of new skills from as few as five demonstrations using TAPAS-GMM. The work also provides a thorough analysis of data quality and generalization, showing ee-only policies generalize better to new contexts than joint base–end-effector models, and it makes code publicly available for community use and large-scale data collection.

Abstract

Demonstration data plays a key role in learning complex behaviors and training robotic foundation models. While effective control interfaces exist for static manipulators, data collection remains cumbersome and time intensive for mobile manipulators due to their large number of degrees of freedom. While specialized hardware, avatars, or motion tracking can enable whole-body control, these approaches are either expensive, robot-specific, or suffer from the embodiment mismatch between robot and human demonstrator. In this work, we present MoMa-Teleop, a novel teleoperation method that infers end-effector motions from existing interfaces and delegates the base motions to a previously developed reinforcement learning agent, leaving the operator to focus fully on the task-relevant end-effector motions. This enables whole-body teleoperation of mobile manipulators with no additional hardware or setup costs via standard interfaces such as joysticks or hand guidance. Moreover, the operator is not bound to a tracked workspace and can move freely with the robot over spatially extended tasks. We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks. As the generated data covers diverse whole-body motions without embodiment mismatch, it enables efficient imitation learning. By focusing on task-specific end-effector motions, our approach learns skills that transfer to unseen settings, such as new obstacles or changed object positions, from as little as five demonstrations. We make code and videos available at https://moma-teleop.cs.uni-freiburg.de.
Paper Structure (24 sections, 3 equations, 9 figures, 5 tables)

This paper contains 24 sections, 3 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Operating mobile manipulators requires to control a large number of degrees of freedom to move base (red), arm (yellow) and end-effector (green), requiring multiple input devices or expensive exoskeletons. MoMa-Teleop infers end-effector motions from the operator and communicates them to a reinforcement learning agent to move the base in compliance by converting them to whole-body motions.
  • Figure 2: MoMa-Teleop: We modularize teleoperation for mobile manipulators. The human operator controls the end-effector of the robot, through a range of possible interfaces. A reinforcement learning agent then transforms these commands into whole-body commands, moving the base in compliance to achieve the operator's desired motions, while considering the robot's kinematics and obstacle constraints.
  • Figure 3: Teleoperation tasks on the HSR (left) and FMM (right) robots.
  • Figure 4: Average completion times of new users. Bars indicate standard errors.
  • Figure S.1: Left: Reference frame for the control inputs in the wrist camera view of the HSR robot and button assignment of MoMa-Teleop. Right: Button assignment of the original teleoperation ROS package developed for the HSR.
  • ...and 4 more figures