Masked Autoencoding for Scalable and Generalizable Decision Making

Fangchen Liu; Hao Liu; Aditya Grover; Pieter Abbeel

Masked Autoencoding for Scalable and Generalizable Decision Making

Fangchen Liu, Hao Liu, Aditya Grover, Pieter Abbeel

TL;DR

MaskDP introduces a scalable, self-supervised pretraining framework for reinforcement learning that masks state-action trajectories and reconstructs them with a bidirectional Transformer. By employing random masking with multiple ratios, MaskDP learns forward and inverse dynamics from diverse unlabeled data, enabling zero-shot transfer to goal-reaching and skill prompting, and competitive offline RL performance. The method demonstrates model-size scaling and robustness across data quality, while supporting open-loop generation and closed-loop replanning at deployment. Overall, MaskDP provides a general masked-prediction paradigm for sequential decision making that parallels success in NLP and vision.

Abstract

We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models. To this end, this paper presents masked decision prediction (MaskDP), a simple and scalable self-supervised pretraining method for reinforcement learning (RL) and behavioral cloning (BC). In our MaskDP approach, we employ a masked autoencoder (MAE) to state-action trajectories, wherein we randomly mask state and action tokens and reconstruct the missing data. By doing so, the model is required to infer masked-out states and actions and extract information about dynamics. We find that masking different proportions of the input sequence significantly helps with learning a better model that generalizes well to multiple downstream tasks. In our empirical study, we find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching, and it can zero-shot infer skills from a few example transitions. In addition, MaskDP transfers well to offline RL and shows promising scaling behavior w.r.t. to model size. It is amenable to data-efficient finetuning, achieving competitive results with prior methods based on autoregressive pretraining.

Masked Autoencoding for Scalable and Generalizable Decision Making

TL;DR

Abstract

Paper Structure (48 sections, 18 figures, 9 tables)

This paper contains 48 sections, 18 figures, 9 tables.

Introduction
Related work
Masked modeling in language and vision.
Sequence modeling in RL
Unsupervised pretraining in RL
Method
MaskDP Pretraining
Random masking.
Architecture
Prediction target
MaskDP Downstream Tasks
MaskDP for goal reaching
MaskDP for skill prompting
MaskDP for offline RL
Experiments
...and 33 more sections

Figures (18)

Figure 1: Illustration of MaskDP. During pretraining stage, we perform the masked token prediction task. And after pretraining, the model can be deployed to various downstream tasks using different mask patterns.
Figure 2: Single task pretraining followed by single goal reaching downstream task. MaskDP with closed-loop execution achieves the best performance on all the tasks, and get the most significant improvements in the Quadruped domain, which is higher dimensional.
Figure 3: Single task pretraining followed by multiple goals reaching downstream task. MaskDP achieves significant improvement on all the tasks with better flexibility in sequential goal reaching.
Figure 4: Multiple tasks pretraining followed by single goal reaching downstream task, where MaskDP with closed-loop execution works the best, especially in the Quadruped domain.
Figure 5: Multiple task pretraining followed by multiple goals reaching downstream task.
...and 13 more figures

Masked Autoencoding for Scalable and Generalizable Decision Making

TL;DR

Abstract

Masked Autoencoding for Scalable and Generalizable Decision Making

Authors

TL;DR

Abstract

Table of Contents

Figures (18)