Table of Contents
Fetching ...

FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning

Li-Heng Lin, Yuchen Cui, Amber Xie, Tianyu Hua, Dorsa Sadigh

TL;DR

FlowRetrieval tackles the data-hungry nature of few-shot imitation learning by introducing motion-guided data retrieval. It builds a motion-centric latent space from optical-flow representations using a variational autoencoder and retrieves the most motion-similar prior data to augment target demonstrations, while training with an auxiliary optical-flow prediction loss to shape representations. Evaluations across five manipulation tasks in simulation and on real robots show substantial gains over baselines, including a 27% average improvement over the best prior retrieval method and 3.7× improvement in a real-world Pen-in-Cup task. The method decouples retrieval from policy learning and demonstrates strong cross-task transfer, though it incurs retrieval overhead and requires threshold tuning for optimal retrieval size.

Abstract

Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. Retrieval-based methods come with a promise of retrieving relevant past experiences to augment this target data when learning policies. However, existing data retrieval methods fall under two extremes: they either rely on the existence of exact behaviors with visually similar scenes in the prior data, which is impractical to assume; or they retrieve based on semantic similarity of high-level language descriptions of the task, which might not be that informative about the shared low-level behaviors or motions across tasks that is often a more important factor for retrieving relevant data for policy learning. In this work, we investigate how we can leverage motion similarity in the vast amount of cross-task data to improve few-shot imitation learning of the target task. Our key insight is that motion-similar data carries rich information about the effects of actions and object interactions that can be leveraged during few-shot adaptation. We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data, and for guiding learning of a policy that can maximally benefit from such data. Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains, achieving on average 27% higher success rate than the best retrieval-based prior method. In the Pen-in-Cup task with a real Franka Emika robot, FlowRetrieval achieves 3.7x the performance of the baseline imitation learning technique that learns from all prior and target data. Website: https://flow-retrieval.github.io

FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning

TL;DR

FlowRetrieval tackles the data-hungry nature of few-shot imitation learning by introducing motion-guided data retrieval. It builds a motion-centric latent space from optical-flow representations using a variational autoencoder and retrieves the most motion-similar prior data to augment target demonstrations, while training with an auxiliary optical-flow prediction loss to shape representations. Evaluations across five manipulation tasks in simulation and on real robots show substantial gains over baselines, including a 27% average improvement over the best prior retrieval method and 3.7× improvement in a real-world Pen-in-Cup task. The method decouples retrieval from policy learning and demonstrates strong cross-task transfer, though it incurs retrieval overhead and requires threshold tuning for optimal retrieval size.

Abstract

Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. Retrieval-based methods come with a promise of retrieving relevant past experiences to augment this target data when learning policies. However, existing data retrieval methods fall under two extremes: they either rely on the existence of exact behaviors with visually similar scenes in the prior data, which is impractical to assume; or they retrieve based on semantic similarity of high-level language descriptions of the task, which might not be that informative about the shared low-level behaviors or motions across tasks that is often a more important factor for retrieving relevant data for policy learning. In this work, we investigate how we can leverage motion similarity in the vast amount of cross-task data to improve few-shot imitation learning of the target task. Our key insight is that motion-similar data carries rich information about the effects of actions and object interactions that can be leveraged during few-shot adaptation. We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data, and for guiding learning of a policy that can maximally benefit from such data. Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains, achieving on average 27% higher success rate than the best retrieval-based prior method. In the Pen-in-Cup task with a real Franka Emika robot, FlowRetrieval achieves 3.7x the performance of the baseline imitation learning technique that learns from all prior and target data. Website: https://flow-retrieval.github.io
Paper Structure (21 sections, 7 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 7 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: Overview of FlowRetrieval. We first learn a motion-centric latent space through training a VAE for embedding optical flows; then retrieve prior data similar to target data by measuring pairwise distances in the learned latent space; and we augment policy co-training with a supervised prediction loss for optical flow.
  • Figure 2: Different types of motion guidance for imitation learning. Here, we present a spectrum of different types of motion guidance used in the literature for either retrieval or directly learning the policy. This spectrum varies the level of granularity of motion starting from coarse semantic information via language from the left to guidance about feature traces or optical flow capturing motion and shape of objects all the way to more fine-grained guidance such as visual dynamics that also capture the texture of the scene.
  • Figure 3: Experimental Setup. We experiment with 5 different manipulation tasks for evaluating FlowRetrieval. Bottom row shows the prior dataset we used in each experiment and their meta data.
  • Figure 4: Quantitative Results. We plot success rates (%) of learned policies and observe that FlowRetrieval outperforms baselines across tasks. Simulation results are averaged over 2 training seeds and 3 evaluation seeds (50 rollouts each). Real results are from 25 rollouts of best-of-last-3 checkpoints.
  • Figure 5: Qualitative Analysis. We visualize most similar motion in prior dataset to queries from target task demonstrations. For each example data point, we show the image observations $s_t$ and $s_{t+16}$, then overlay the optical flow on top of $s_t$. We see that FlowRetrieval focuses on retrieving similar motions to target queries while baselines may retrieve visually similar states with different motions.
  • ...and 6 more figures