Table of Contents
Fetching ...

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges

Xiaoqian Liu, Jianbin Jiao, Junge Zhang

TL;DR

The paper addresses the inefficiencies and limited generalization of traditional decision-making methods by advocating a Pretrain-Then-Adapt framework that leverages large-scale, self-supervised pretraining on diverse trajectories to produce a reusable representation for downstream tasks. It formalizes sequential decision-making via MDPs/POMDPs, recasts RL as sequence modeling, and surveys data collection, tokenization, and pretraining objectives tailored to decision tasks, followed by adaptation via fine-tuning or zero-shot approaches. Key contributions include taxonomy of pretraining data quality, multi-modal trajectory tokenization schemes, a range of pretraining objectives (next-token, masked-token, and contrastive variants), and practical downstream strategies (BC, offline/online RL, and PEFT). The paper also identifies challenges in tokenization, objective design, data quality, catastrophic forgetting, and the lack of unified downstream evaluation, outlining directions toward robust, scalable decision foundation models with real-world impact.

Abstract

Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges

TL;DR

The paper addresses the inefficiencies and limited generalization of traditional decision-making methods by advocating a Pretrain-Then-Adapt framework that leverages large-scale, self-supervised pretraining on diverse trajectories to produce a reusable representation for downstream tasks. It formalizes sequential decision-making via MDPs/POMDPs, recasts RL as sequence modeling, and surveys data collection, tokenization, and pretraining objectives tailored to decision tasks, followed by adaptation via fine-tuning or zero-shot approaches. Key contributions include taxonomy of pretraining data quality, multi-modal trajectory tokenization schemes, a range of pretraining objectives (next-token, masked-token, and contrastive variants), and practical downstream strategies (BC, offline/online RL, and PEFT). The paper also identifies challenges in tokenization, objective design, data quality, catastrophic forgetting, and the lack of unified downstream evaluation, outlining directions toward robust, scalable decision foundation models with real-world impact.

Abstract

Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.
Paper Structure (24 sections, 1 equation, 2 figures, 2 tables)

This paper contains 24 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Pretrain-then-Adapt pipeline for decision foundation model. Left: Self-supervised pretraining involves two basic pretraining objectives based on Transformer architecture: next token prediction and masked token prediction. However, using different RL components, various pretraining objectives can be proposed for pretraining decision foundation models as introduced in section \ref{['sec:objective']}. Right: Downstream inference tasks can be roughly divided into three categories: action inference, dynamcis inference and trajectory inference. Details about each inference task can be referred to section \ref{['sec:infer task']}. Different colors denote different data modality in trajectory sequences. The grey color means the token is masked.
  • Figure 2: Tokenization strategies for self-supervised pretraining on decision-making tasks. Here the raw trajectory data comprises observations and actions. Note that in-principle, the tokenization can handle any modality.