Offline Imitation Learning Through Graph Search and Retrieval

Zhao-Heng Yin; Pieter Abbeel

Offline Imitation Learning Through Graph Search and Retrieval

Zhao-Heng Yin, Pieter Abbeel

TL;DR

GSR addresses the challenge of learning from suboptimal demonstrations in robotic manipulation by replacing offline RL with a graph-based preprocessing step. It constructs a graph from pretrained representations, uses graph search to evaluate behavior quality, and applies retrieval to reweight transitions before behavior cloning. Empirically, GSR yields consistent improvements in success rate and proficiency on both simulation and real-world tasks, while remaining computationally efficient. The approach offers a practical, offline-friendly pathway to leverage suboptimal data for dexterous manipulation with high visual richness.

Abstract

Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. Nevertheless, many real-world manipulation tasks involve precise and dexterous robot-object interactions, which make it difficult for humans to collect high-quality expert demonstrations. As a result, a robot has to learn skills from suboptimal demonstrations and unstructured interactions, which remains a key challenge. Existing works typically use offline deep reinforcement learning (RL) to solve this challenge, but in practice these algorithms are unstable and fragile due to the deadly triad issue. To overcome this problem, we propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. We first use pretrained representation to organize the interaction experience into a graph and perform a graph search to calculate the values of different behaviors. Then, we apply a retrieval-based procedure to identify the best behavior (actions) on each state and use behavior cloning to learn that behavior. We evaluate our method in both simulation and real-world robotic manipulation tasks with complex visual inputs, covering various precise and dexterous manipulation skills with objects of different physical properties. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines. Our project page is at https://zhaohengyin.github.io/gsr.

Offline Imitation Learning Through Graph Search and Retrieval

TL;DR

Abstract

Paper Structure (30 sections, 7 equations, 13 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 7 equations, 13 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Problem Formulation
Offline Policy Learning
Policy Learning by Graph Search and Retrieval
Overview
Graph Construction
Building Vertex Set $\mathcal{V}$
Building Edge Set $\mathcal{E}$
Policy Improvement with Retrieval
Implementation and Time Complexity
Experiments
Experiment Setup
Simulation Experiments
...and 15 more sections

Figures (13)

Figure 1: Collecting high-quality human demonstrations for imitation learning can be very difficult. Consider the problem of using a spoon, tying a rubber band, and picking up tiny objects with a tweezer. These tasks require very precise and accurate manipulator movement. A human operator can exhibit suboptimal behaviors and retry multiple times during a single demonstration. We propose a novel algorithm called GSR to learn proficient behavior from such suboptimal data through graph search and retrieval.
Figure 2: Overview of our algorithm. We first build a graph to represent the demonstration dataset and run a graph search to evaluate the goodness (value) of each node. Then, we use a retrieval process to reassign weights to the transitions associated with each node. The weight reassignment is based on the similarity and goodness (value) score. The entire process will give more weight to the optimal behaviors around each node, resulting in a reweighted dataset. We use this reweighted dataset for behavior cloning.
Figure 3: Identifying connectivity. Augmented edge: We add a bidirectional edge between two nodes $u$ and $v$ if they both lie in the tolerance range of each other in the pretrained representation space. Dataset edge: It represents ground truth transitions on each demonstration trajectory.
Figure 4: The real world robot manipulation setup. We conduct experiments on a UR5 robot arm with Robotiq gripper. We use 3 workspace cameras with $256 \times 256$ RGB observation, highlighting the challenge in perception especially for our considered precise manipulation tasks.
Figure 5: Illustration of the used tasks. Above: Tasks from the robomimic benchmark. Bottom: Our real-world tasks.
...and 8 more figures

Offline Imitation Learning Through Graph Search and Retrieval

TL;DR

Abstract

Offline Imitation Learning Through Graph Search and Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (13)