Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges
Xiaoqian Liu, Jianbin Jiao, Junge Zhang
TL;DR
The paper addresses the inefficiencies and limited generalization of traditional decision-making methods by advocating a Pretrain-Then-Adapt framework that leverages large-scale, self-supervised pretraining on diverse trajectories to produce a reusable representation for downstream tasks. It formalizes sequential decision-making via MDPs/POMDPs, recasts RL as sequence modeling, and surveys data collection, tokenization, and pretraining objectives tailored to decision tasks, followed by adaptation via fine-tuning or zero-shot approaches. Key contributions include taxonomy of pretraining data quality, multi-modal trajectory tokenization schemes, a range of pretraining objectives (next-token, masked-token, and contrastive variants), and practical downstream strategies (BC, offline/online RL, and PEFT). The paper also identifies challenges in tokenization, objective design, data quality, catastrophic forgetting, and the lack of unified downstream evaluation, outlining directions toward robust, scalable decision foundation models with real-world impact.
Abstract
Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.
