TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Shivin Dass; Wensi Ai; Yuqian Jiang; Samik Singh; Jiaheng Hu; Ruohan Zhang; Peter Stone; Ben Abbatematteo; Roberto Martín-Martín

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Shivin Dass, Wensi Ai, Yuqian Jiang, Samik Singh, Jiaheng Hu, Ruohan Zhang, Peter Stone, Ben Abbatematteo, Roberto Martín-Martín

TL;DR

This work demonstrates TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators, and unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof, which helps researchers to collect whole-body mobile manipulation demonstrations.

Abstract

A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

TL;DR

Abstract

Paper Structure (23 sections, 7 figures, 4 tables)

This paper contains 23 sections, 7 figures, 4 tables.

Introduction
Related Work
TeleMoMa System
Human Interface
Vision-Based Human Interface
Virtual Reality Controllers as Human Interface
Teleoperation Channel
Robot Interface
Experiments
User Study
Imitation Learning with TeleMoMa's Data
Remote Teleoperation
Comparing Different Embodiments and Sim vs. Real
Sim vs. Real
Comparing Embodiments
...and 8 more sections

Figures (7)

Figure 1: TeleMoMa: a modular and versatile mobile manipulation teleoperation system. Left and Middle Demonstrators performing a bimanual sweeping task with the vision-only, virtual reality (VR), and a combination of vision+VR interfaces. TeleMoMa enables multiple human interfaces and their combination. Middle and Right Tiago (real), HSR (real), and Fetch (simulation), three of the robot platforms that we demonstrate teleoperated for different mobile manipulation tasks with TeleMoMa, demonstrating its versatility.
Figure 2: TeleMoMa System. TeleMoMa consists of three components: the Human Interface acquires commands from the human using different input devices; the Teleoperation Channel defines the action command structure between the human and the robot interfaces, and, possibly, closes the loop with observations from the robot; and the Robot Interface implements a robot-specific mapping of actions to low-level robot commands. This architecture enables modularity and versatility -- combining multiple devices to achieve intuitive whole-body teleoperation for multiple tasks and robots.
Figure 3: Tasks in our evaluation of TeleMoMa. Shown above is the initial and goal state of each task.
Figure 4: User Study 1: Completion Time. Vision modalities outperform only-VR for the more challenging dusting task. Error bars denote the standard error of the mean.
Figure 5: User Study 2: User Improvement, Learning Curve. New users generally improve at completing the pick up task with TeleMoMa across teleoperation modalities. Transparent lines show individual learning curves.
...and 2 more figures

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

TL;DR

Abstract

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)