SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Jesse Zhang; Karl Pertsch; Jiahui Zhang; Joseph J. Lim

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim

TL;DR

SPRINT addresses the high cost of human language annotation for pre-training robotic policies by automatically expanding the pre-training task set. It uses two key ideas: (1) language-model-based aggregation to compose longer, semantically meaningful instructions from existing sub-tasks, and (2) cross-trajectory skill chaining via offline RL to stitch together segments from different trajectories, enabling long-horizon skill learning while preserving the MDP. The approach trains a language-conditioned policy with an instruction-conditioned critic in a fully offline setting, and its efficacy is demonstrated on ALFRED-RL and a real robot kitchen manipulation task, where it yields faster downstream learning and robust zero-shot generalization compared to strong baselines such as L-BC, Episodic Transformers, and SayCan. The results show that SPRINT improves long-horizon task execution and transfer to unseen environments, significantly reducing the need for manual task annotation while enabling practical deployment in real-world robotic contexts.

Abstract

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

TL;DR

Abstract

Paper Structure (46 sections, 4 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 46 sections, 4 equations, 18 figures, 4 tables, 1 algorithm.

Introduction
Related Work
SPRINT: Scalable Policy Pre-Training with Language Instructions
Instruction-Conditioned Offline RL
Cross-Trajectory Chaining
Experiments
Experimental Setup
SPRINT Solves Long-Horizon Tasks Zero-Shot
SPRINT Finetunes Effectively in Unseen Environments
Ablation Studies
Discussion and Limitations
Large Language Model Prompt
Baselines and Implementation
ALFRED Details
Real Robot Implementation Details
...and 31 more sections

Figures (18)

Figure 2: SPRINT overview. We assume access to a dataset of agent experience with language instructions for the performed skills (1). Collecting such instructions with human hindsight annotation is a flexible yet costly approach for defining pre-training tasks. Thus, SPRINT introduces two approaches for automatically growing the set of pre-training tasks without additional human effort: (2) by aggregating language instructions with an LLM and adding the relabeled trajectories back into the pre-training dataset (Section \ref{['sec:llm_relabeling']}), (3) by performing cross-trajectory chaining of skills to enable pre-training of skills that are unseen in the offline agent experience (Section \ref{['sec:cross_chaining']}).
Figure 3: A shortened example of the LLM prompt. See the full prompt in appendix, Section \ref{['sec:appendix:prompt']}.
Figure 4: Left: ALFRED provides a rich set of long-horizon, meaningful tasks and a dataset of 6.6k language-annotated demos. We introduce the ALFRED-RL Benchmark which tests finetuning of RL agents on unseen tasks and scenes. Right: Our Jaco robot arm with RGB image-based control.
Figure 5: ALFRED-RL evaluation results. Left: Zero shot performance on EVALINSTRUCT and EVALLENGTH. SPRINT is able to complete substantially more subtasks than prior approaches. Middle: Breakdown of performance by task length. SPRINT performs well on challenging, long tasks. Numerical results in appendix Table \ref{['tab:zero_shot_numbers']}. Right: Finetuning performance in unseen floor plans of EVALSCENE. SPRINT learns in new floorplans more effectively by reaching higher performance.
Figure 6: Successful rollout of a SPRINT agent offline finetuned for the task above with object combinations not in the pre-training data. SPRINT solves all 8 tasks in sequence.
...and 13 more figures

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

TL;DR

Abstract

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Authors

TL;DR

Abstract

Table of Contents

Figures (18)