Table of Contents
Fetching ...

Continuously Improving Mobile Manipulation with Autonomous Real-World RL

Russell Mendonca, Emmanuel Panov, Bernadette Bucher, Jiuguang Wang, Deepak Pathak

TL;DR

This work presents a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision, and demonstrates that this approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks.

Abstract

We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision. This is enabled by 1) task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states, 2) efficient policy learning by leveraging basic task knowledge in behavior priors, and 3) formulating generic rewards that combine human-interpretable semantic information with low-level, fine-grained observations. We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks, obtaining an average success rate of 80% across tasks, a 3-4 improvement over existing approaches. Videos can be found at https://continual-mobile-manip.github.io/

Continuously Improving Mobile Manipulation with Autonomous Real-World RL

TL;DR

This work presents a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision, and demonstrates that this approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks.

Abstract

We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision. This is enabled by 1) task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states, 2) efficient policy learning by leveraging basic task knowledge in behavior priors, and 3) formulating generic rewards that combine human-interpretable semantic information with low-level, fine-grained observations. We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks, obtaining an average success rate of 80% across tasks, a 3-4 improvement over existing approaches. Videos can be found at https://continual-mobile-manip.github.io/
Paper Structure (21 sections, 6 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 21 sections, 6 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Continual Autonomous Learning: We enable a legged mobile manipulator to learn a variety of tasks such as moving chairs (top, left and right), righting a dustpan (top, middle), and sweeping (bottom) via practice in the real world with minimal human intervention.
  • Figure 2: Method Overview: The main components of our approach for robots to continually practice tasks in the real world. Left: Task-relevant autonomy to ensure collection of useful data via object interaction, and maintaining state diversity via automated resets using multi-goal and multi-robot setups. Center: Efficient control by aiding policy learning with basic task knowledge present in behavior priors in the form of planners with a simplified model or automated behaviors. Right: Flexible reward supervision that combines human-interpretable semantic detection-segmentation information with low-level, fine-grained depth observation.
  • Figure 3: Task Goals: States that define goal-cycles for our 4 tasks - (a-b): Chair Moving with a corner table, (c-d): Chair Moving with a middle table, (e-f): Long Handled Dustpan Standup, (g-h): Sweeping
  • Figure 4: Continual training improvement: Success rate vs number of samples for ours, only RL and only prior. Note that we use our task-relevant autonomy approach with all methods. We see that our approach continuously improves with experience across tasks, learning much faster than RL without priors, and attaining significantly higher performance than just using the prior.
  • Figure 5: Training mean reward: Mean reward vs number of samples for the chair moving tasks. The negative average reward for RL without priors indicates that the robot is often far from the goal location.
  • ...and 5 more figures