Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

Ruijie Zheng; Yongyuan Liang; Xiyao Wang; Shuang Ma; Hal Daumé; Huazhe Xu; John Langford; Praveen Palanisamy; Kalyan Shankar Basu; Furong Huang

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang

TL;DR

Premier-TACO targets few-shot policy learning in sequential decision-making by learning a transferable visual representation through a reward-free, dynamics-based objective. It extends the TACO framework by replacing batch-wide negatives with a single hard negative sampled from a temporal window, optimizing the mutual-information objective $\mathcal{I}(Z_{t+K}; [Z_t,U_t,...,U_{t+K-1}])$ via InfoNCE. Empirically, it delivers state-of-the-art or near state-of-the-art performance across DeepMind Control Suite, MetaWorld, and LIBERO with limited demonstrations, and improves unseen-task generalization and robustness to data quality. It also demonstrates compatibility with large pretrained encoders such as R3M and improves sample efficiency when finetuning with in-domain control data. The work contributes a scalable, control-centric, multitask pretraining paradigm with broad practical impact for robotics and sequential decision-making tasks.

Abstract

We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks. Our code, pretraining data, as well as pretrained model checkpoints will be released at https://github.com/PremierTACO/premier-taco. Our project webpage is at https://premiertaco.github.io.

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

TL;DR

via InfoNCE. Empirically, it delivers state-of-the-art or near state-of-the-art performance across DeepMind Control Suite, MetaWorld, and LIBERO with limited demonstrations, and improves unseen-task generalization and robustness to data quality. It also demonstrates compatibility with large pretrained encoders such as R3M and improves sample efficiency when finetuning with in-domain control data. The work contributes a scalable, control-centric, multitask pretraining paradigm with broad practical impact for robotics and sequential decision-making tasks.

Abstract

Paper Structure (17 sections, 1 equation, 11 figures, 6 tables)

This paper contains 17 sections, 1 equation, 11 figures, 6 tables.

Introduction
Preliminary
Multitask Offline Pretraining
TACO: Temporal Action Driven Contrastive Learning Objective
Method
Experiment
Related Work
Conclusion
Detailed Discussion of Related Work
Additional Experiment Results
Finetuning
Pretrained Visual Representations
LIBERO-10 success rate
Additional Experiment Results on Downstream Online Reinforcement Learning
Implementation Details
...and 2 more sections

Figures (11)

Figure 1: Performance of Premier-TACO pretrained visual representation for few-shot imitation learning on downstream unseen tasks from Deepmind Control Suite, MetaWorld, and LIBERO. LfS here represents learning from scratch.
Figure 2: Difference between Premier-TACO and TACO for sampling negative examples.
Figure 3: An illustration of Premier-TACO contrastive loss design. The two 'State Encoder's are identical, as are the two 'Proj. Layer $H$'s. One negative example is sampled from the neighbors of framework $s_{t+K}$.
Figure 4: Pretrain and Test Tasks split for Deepmind Control Suite, MetaWorld and Libero. The left figures are Deepmind Control Suite tasks and the right figures MetaWorld tasks.
Figure 6: [(W3) Robustness] Premier-TACO pretrained with exploratory dataset vs. Premier-TACO pretrained with randomly collected dataset
...and 6 more figures

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

TL;DR

Abstract

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

Authors

TL;DR

Abstract

Table of Contents

Figures (11)