Context-Former: Stitching via Latent Conditioned Sequence Modeling

Ziqi Zhang; Jingzehua Xu; Jinxin Liu; Zifeng Zhuang; Donglin Wang; Miao Liu; Shuai Zhang

Context-Former: Stitching via Latent Conditioned Sequence Modeling

Ziqi Zhang, Jingzehua Xu, Jinxin Liu, Zifeng Zhuang, Donglin Wang, Miao Liu, Shuai Zhang

TL;DR

ContextFormer introduces a latent HI-based stitching mechanism for decision making by endowing transformers with divergent sequential expert matching. By learning a latent contextual embedding z^* from a limited set of expert trajectories and optimizing a supervised policy loss, ContextFormer stitches sub-optimal trajectory fragments in the latent space, avoiding the conservatism of offline RL while enhancing generalization. Theoretical analysis connects the HI-based objective with the expert distribution and demonstrates how z^* aligns with expert HI under the expert-dominant regions of trajectory space. Empirically, ContextFormer achieves competitive IL performance and outperforms several DT variants on identical datasets, with strong results in maze2d stitching tasks and informative ablations on demonstration quantity, diversity, and quality.

Abstract

Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as sequence modeling, showcasing competitive performance on offline RL benchmarks. However, recent studies demonstrate that DT lacks of stitching capacity, thus exploiting stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our approach, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multiple IL settings. 2) More importantly, we conduct a comparison of ContextFormer with various competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance.

Context-Former: Stitching via Latent Conditioned Sequence Modeling

TL;DR

Abstract

Paper Structure (42 sections, 8 equations, 3 figures, 9 tables, 1 algorithm)

This paper contains 42 sections, 8 equations, 3 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Offline Reinforcement Learning (RL).
Imitation Learning (IL).
Preliminary
Reinforcement Learning (RL).
Imitation Learning (IL).
Hindsight Information Matching (HIM).
In-Context Learning (ICL).
Can expert matching endow stitching to transformer for decision making?
Notations.
Connection with Stitching.
Context Transformer (ContextFormer)
Method
Training Procedure.
...and 27 more sections

Figures (3)

Figure 1: Total Normalized Scores of ContextFormer, GDT and Prompt DT. (a) Performance comparison with Generalized DT. (b) Performance comparison with Prompt-DT. Specifically, we conducted a performance comparison between ContextFormer (LfD #5) and GDT using the same six offline datasets: hopper-m (mr), walker2d-m (mr), and halfcheetah-m (mr). Additionally, we compared ContextFormer (LfD #1) with Prompt-DT and PTDT-offline on hopper-m, walker2d-m, and halfcheetah-m. The original experimental results have been appended in Appendix \ref{['supply_exp']}.
Figure 2: In this graph, we gradually increase the descriptions of expert trajectories and further observe the performance of ContextFormer in the Learning from Demonstration (LfD) setting.
Figure : ContextFormer

Context-Former: Stitching via Latent Conditioned Sequence Modeling

TL;DR

Abstract

Context-Former: Stitching via Latent Conditioned Sequence Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (3)