Table of Contents
Fetching ...

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery

Weikang Wan, Yifeng Zhu, Rutav Shah, Yuke Zhu

TL;DR

LOTUS is introduced, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan, showing its superior knowledge transfer ability compared to prior methods.

Abstract

We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind LOTUS is constructing an ever-growing skill library from a sequence of new tasks with a small number of human demonstrations. LOTUS starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unsegmented demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to solve novel tasks. LOTUS trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in success rate, showing its superior knowledge transfer ability compared to prior methods. More results and videos can be found on the project website: https://ut-austin-rpl.github.io/Lotus/.

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery

TL;DR

LOTUS is introduced, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan, showing its superior knowledge transfer ability compared to prior methods.

Abstract

We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind LOTUS is constructing an ever-growing skill library from a sequence of new tasks with a small number of human demonstrations. LOTUS starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unsegmented demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to solve novel tasks. LOTUS trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in success rate, showing its superior knowledge transfer ability compared to prior methods. More results and videos can be found on the project website: https://ut-austin-rpl.github.io/Lotus/.
Paper Structure (10 sections, 3 figures, 2 tables)

This paper contains 10 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 2: Method Overview.LOTUS is a continual imitation learning algorithm through unsupervised skill discovery. LOTUS starts from the base task stage, where it builds an initial library of sensorimotor skills. In the subsequent lifelong task stage, it continuously discovers new skills from a stream of incoming tasks and adds them to its skill library. A high-level meta-controller composes skills from the library to solve new manipulation tasks. We mark the newly acquired skills in the library with .
  • Figure 3: LOTUS consists of two processes: continual skill discovery with open-world perception and hierarchical policy learning with the skill library. For continual skill discovery, we obtain temporal segments from demonstrations using hierarchical clustering with DINOv2 features and incrementally cluster the temporal segments into partitions to either update existing skills or learn new skills. For the hierarchical policy, a meta-controller $\pi^{H}$ selects a skill by predicting an index $k$ and specifies the subgoals for the selected skill $\pi^{L}_{k}$ to achieve. Note that because the input to a transformer is permutation invariant, we also add sinusoidal positional encoding to input tokens to inform transformers of the temporal order of input tokens vaswani2017attention. We omit this information in the figure for clarity.
  • Figure 4: LOTUS continually discovers skills from real-world tasks in the robot's lifespan of learning (each color represents a skill). The skill (in blue) used for "reaching the bottom drawer" is also used for "pushing the oven tray inside" due to similar motion in action space. It shows the forward transfer of skills enabled by LOTUS. The skill (in purple) discovered in Step 3 of "putting the yellow bowl on the oven tray" is used for a previous task, "putting the bread on the oven tray," demonstrating the backward transfer of learned skills in our method.