Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

Zenan Li; Fan Nie; Qiao Sun; Fang Da; Hang Zhao

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao

TL;DR

An UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models, and replaces the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions.

Abstract

Offline Reinforcement Learning (RL) enables policy learning without active interactions, making it especially appealing for self-driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which, however, fails in stochastic environments with incorrect assumptions that identical actions can consistently achieve the same goal. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates uncertainties by conditional mutual information between transitions and returns. Discovering 'uncertainty accumulation' and 'temporal locality' properties of driving environments, we replace the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions. We also dynamically evaluate uncertainty at inference for cautious planning. Extensive experiments demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

TL;DR

Abstract

Paper Structure (57 sections, 3 theorems, 24 equations, 11 figures, 10 tables)

This paper contains 57 sections, 3 theorems, 24 equations, 11 figures, 10 tables.

Introduction
Related Works
Offline RL as Sequence Modeling:
Uncertainty Estimation:
Preliminary
Online and Offline RL
Offline RL as Sequence Modeling
Approach: UNREST
Model Overview
Transformers for Uncertainty Estimation
Transformer for Sequential Decision-Making
Segmentation strategy
Policy formulation:
Uncertainty-guided Planning
Experiments
...and 42 more sections

Key Result

Proposition 1

Assuming that the rewards obtained are determined by transitions $(s,a\rightarrow s')$ at each timestep and UNREST is perfectly trained to fit the expert demonstrations, then the discrepancy between target truncated returns and URNEST's rollout returns is bounded by a factor of environmental stochas

Figures (11)

Figure 1: Motivations of UNREST. (a): Example driving scenario where the variance of return increases when accounting for multiple tasks. (b): Calibration results of return distribution over future 1,000 steps are more uncertain than 100 steps. (c): Rollout returns/distances of sequences maximizing the return of future 100, 500, and 1,000 steps in the dataset are close to each other.
Figure 2: Overview of UNREST. Lower: Two return prediction transformers are trained for uncertainty estimation. The sequence is then segmented into certain (no background) and uncertain (orange background) parts w.r.t. estimated uncertainties, with 'certain parts' conditioned on returns to the next segmentation positions, and dummy tokens in 'uncertain parts'. Upper: The same architecture as DTs is used for action prediction, except that we add a return-span embedding to the truncated return embedding, and concatenate the global return embedding to the transformer output.
Figure 2: Ablation study results for UNREST on train town and train weather conditions.
Figure 3: Uncertainty-guided planning
Figure 4: UNREST performs well at failing cases of TT and SPLT. White rectangles are ego-vehicles.
...and 6 more figures

Theorems & Definitions (3)

Proposition 1: UNREST Alignment Bound
Theorem 1: UNREST Alignment Bound
Lemma 1: Determinism Equality

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

TL;DR

Abstract

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (3)