Continuously Improving Mobile Manipulation with Autonomous Real-World RL

Russell Mendonca; Emmanuel Panov; Bernadette Bucher; Jiuguang Wang; Deepak Pathak

Continuously Improving Mobile Manipulation with Autonomous Real-World RL

Russell Mendonca, Emmanuel Panov, Bernadette Bucher, Jiuguang Wang, Deepak Pathak

TL;DR

This work presents a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision, and demonstrates that this approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks.

Abstract

We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision. This is enabled by 1) task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states, 2) efficient policy learning by leveraging basic task knowledge in behavior priors, and 3) formulating generic rewards that combine human-interpretable semantic information with low-level, fine-grained observations. We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks, obtaining an average success rate of 80% across tasks, a 3-4 improvement over existing approaches. Videos can be found at https://continual-mobile-manip.github.io/

Continuously Improving Mobile Manipulation with Autonomous Real-World RL

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 21 sections, 6 equations, 10 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Autonomous Real-World RL:
Mobile Manipulation
Continuously Improving Mobile Manipulation via Real-world RL
Task-Relevant Autonomy
Prior-guided Policy Learning
Flexible Supervision via Text-Prompted Segmentation
Experimental Setup
Results
Discussion and Limitations
Acknowledgements
Videos
Policy Training
Rewards
...and 6 more sections

Figures (10)

Figure 1: Continual Autonomous Learning: We enable a legged mobile manipulator to learn a variety of tasks such as moving chairs (top, left and right), righting a dustpan (top, middle), and sweeping (bottom) via practice in the real world with minimal human intervention.
Figure 2: Method Overview: The main components of our approach for robots to continually practice tasks in the real world. Left: Task-relevant autonomy to ensure collection of useful data via object interaction, and maintaining state diversity via automated resets using multi-goal and multi-robot setups. Center: Efficient control by aiding policy learning with basic task knowledge present in behavior priors in the form of planners with a simplified model or automated behaviors. Right: Flexible reward supervision that combines human-interpretable semantic detection-segmentation information with low-level, fine-grained depth observation.
Figure 3: Task Goals: States that define goal-cycles for our 4 tasks - (a-b): Chair Moving with a corner table, (c-d): Chair Moving with a middle table, (e-f): Long Handled Dustpan Standup, (g-h): Sweeping
Figure 4: Continual training improvement: Success rate vs number of samples for ours, only RL and only prior. Note that we use our task-relevant autonomy approach with all methods. We see that our approach continuously improves with experience across tasks, learning much faster than RL without priors, and attaining significantly higher performance than just using the prior.
Figure 5: Training mean reward: Mean reward vs number of samples for the chair moving tasks. The negative average reward for RL without priors indicates that the robot is often far from the goal location.
...and 5 more figures

Continuously Improving Mobile Manipulation with Autonomous Real-World RL

TL;DR

Abstract

Continuously Improving Mobile Manipulation with Autonomous Real-World RL

Authors

TL;DR

Abstract

Table of Contents

Figures (10)