Table of Contents
Fetching ...

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim

TL;DR

SPRINT addresses the high cost of human language annotation for pre-training robotic policies by automatically expanding the pre-training task set. It uses two key ideas: (1) language-model-based aggregation to compose longer, semantically meaningful instructions from existing sub-tasks, and (2) cross-trajectory skill chaining via offline RL to stitch together segments from different trajectories, enabling long-horizon skill learning while preserving the MDP. The approach trains a language-conditioned policy with an instruction-conditioned critic in a fully offline setting, and its efficacy is demonstrated on ALFRED-RL and a real robot kitchen manipulation task, where it yields faster downstream learning and robust zero-shot generalization compared to strong baselines such as L-BC, Episodic Transformers, and SayCan. The results show that SPRINT improves long-horizon task execution and transfer to unseen environments, significantly reducing the need for manual task annotation while enabling practical deployment in real-world robotic contexts.

Abstract

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

TL;DR

SPRINT addresses the high cost of human language annotation for pre-training robotic policies by automatically expanding the pre-training task set. It uses two key ideas: (1) language-model-based aggregation to compose longer, semantically meaningful instructions from existing sub-tasks, and (2) cross-trajectory skill chaining via offline RL to stitch together segments from different trajectories, enabling long-horizon skill learning while preserving the MDP. The approach trains a language-conditioned policy with an instruction-conditioned critic in a fully offline setting, and its efficacy is demonstrated on ALFRED-RL and a real robot kitchen manipulation task, where it yields faster downstream learning and robust zero-shot generalization compared to strong baselines such as L-BC, Episodic Transformers, and SayCan. The results show that SPRINT improves long-horizon task execution and transfer to unseen environments, significantly reducing the need for manual task annotation while enabling practical deployment in real-world robotic contexts.

Abstract

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.
Paper Structure (46 sections, 4 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 46 sections, 4 equations, 18 figures, 4 tables, 1 algorithm.

Figures (18)

  • Figure 2: SPRINT overview. We assume access to a dataset of agent experience with language instructions for the performed skills (1). Collecting such instructions with human hindsight annotation is a flexible yet costly approach for defining pre-training tasks. Thus, SPRINT introduces two approaches for automatically growing the set of pre-training tasks without additional human effort: (2) by aggregating language instructions with an LLM and adding the relabeled trajectories back into the pre-training dataset (Section \ref{['sec:llm_relabeling']}), (3) by performing cross-trajectory chaining of skills to enable pre-training of skills that are unseen in the offline agent experience (Section \ref{['sec:cross_chaining']}).
  • Figure 3: A shortened example of the LLM prompt. See the full prompt in appendix, Section \ref{['sec:appendix:prompt']}.
  • Figure 4: Left: ALFRED provides a rich set of long-horizon, meaningful tasks and a dataset of 6.6k language-annotated demos. We introduce the ALFRED-RL Benchmark which tests finetuning of RL agents on unseen tasks and scenes. Right: Our Jaco robot arm with RGB image-based control.
  • Figure 5: ALFRED-RL evaluation results. Left: Zero shot performance on EVALINSTRUCT and EVALLENGTH. SPRINT is able to complete substantially more subtasks than prior approaches. Middle: Breakdown of performance by task length. SPRINT performs well on challenging, long tasks. Numerical results in appendix Table \ref{['tab:zero_shot_numbers']}. Right: Finetuning performance in unseen floor plans of EVALSCENE. SPRINT learns in new floorplans more effectively by reaching higher performance.
  • Figure 6: Successful rollout of a SPRINT agent offline finetuned for the task above with object combinations not in the pre-training data. SPRINT solves all 8 tasks in sequence.
  • ...and 13 more figures