Table of Contents
Fetching ...

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn

TL;DR

Behavior Retrieval introduces a retrieval-based fine-tuning paradigm for imitation learning that uses a small set of task-specific demonstrations to selectively pull relevant transitions from a large unlabeled offline dataset. It learns a state-action embedding from the prior data, uses embedding-based similarity to retrieve pertinent transitions, and trains a policy on the combined dataset, improving stability and performance over naive pre-training or naive data mixing. Across simulated and real robotic manipulation tasks, it outperforms traditional pre-training+finetuning and other retrieval strategies by significant margins and demonstrates robustness to distribution shifts. The approach enables leveraging diverse offline data to achieve data-efficient, high-performance imitation on novel tasks with minimal human supervision.

Abstract

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

TL;DR

Behavior Retrieval introduces a retrieval-based fine-tuning paradigm for imitation learning that uses a small set of task-specific demonstrations to selectively pull relevant transitions from a large unlabeled offline dataset. It learns a state-action embedding from the prior data, uses embedding-based similarity to retrieve pertinent transitions, and trains a policy on the combined dataset, improving stability and performance over naive pre-training or naive data mixing. Across simulated and real robotic manipulation tasks, it outperforms traditional pre-training+finetuning and other retrieval strategies by significant margins and demonstrates robustness to distribution shifts. The approach enables leveraging diverse offline data to achieve data-efficient, high-performance imitation on novel tasks with minimal human supervision.

Abstract

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.
Paper Structure (40 sections, 4 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 40 sections, 4 equations, 14 figures, 6 tables, 1 algorithm.

Figures (14)

  • Figure 1: Using Task-Specific Data to Query Offline Datasets. Using a small amount of task-specific human expert feedback (i.e. interventions or demonstrations) (blue), our method learns to select the relevant portions (green) of a broad, unlabeled offline dataset (red) to efficiently learn the target task. In this example the task is to have the robot place the square on the right peg (shown with the initial and final frame in blue). While the broader dataset might include irrelevant data (initial and final frame shown in red) where the robot is placing the square on the left peg, it includes useful data providing diversity in terms of how the square needs to be placed on the right peg (shown with initial and final frame in green). Our algorithm identifies these relevant data from the broader dataset and learns from them while ignoring the irrelevant data.
  • Figure 2: The Behavior Retrieval Method. Our approach has 3 main steps. (A) Using the unlabeled offline data $\mathcal{D}_{\text{prior}}$ we pre-train a state-action embedding. (B) We use the pre-trained embedding to look up similar transitions in the offline data $\mathcal{D}_{\text{prior}}$ that are relevant to the task data $\mathcal{D}_t$. (C) We then train a policy with behavior cloning on the mix of the task-specific and retreived data.
  • Figure 3: Training the State-Action Embedding. We train a variational auto-encoder jointly on states and actions to produce our state-action embedding $z_{sa}$.
  • Figure 4: Retrieving from the Unlabeled Dataset. Using our pretrained state-action embedder, we compute the embeddings for the offline dataset $\mathcal{D}_{\text{prior}}$ and the small number of task-specific demos $\mathcal{D}_t$. Then, we select transitions in the offline dataset within a certain distance of the task-specific demos, in embedding space.
  • Figure 5: Simulation Environments. We consider three simulated domains.In each domain we highlight the downstream task data $\mathcal{D}_t$ in blue, relevant offline data from $\mathcal{D}_{\text{prior}}$ in green, and irrelevant downstream data from $\mathcal{D}_{\text{prior}}$ in red. In RoboSuite Can Pick and Place (left), the agent must pick an place a can into the bin, and irrelecant data involves throwing the can randomly. In RoboSuite Nut Assembly (middle), the agent must insert a square into the correct peg, and irrelevant data involves putting it onto the wrong peg. In PyBullet WidowX Office Cleanup (right), the agent must pick and place an eraser into a specified tray, where irrelevant data involved many actions with other objects in the scene.
  • ...and 9 more figures