Table of Contents
Fetching ...

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao

TL;DR

An UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models, and replaces the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions.

Abstract

Offline Reinforcement Learning (RL) enables policy learning without active interactions, making it especially appealing for self-driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which, however, fails in stochastic environments with incorrect assumptions that identical actions can consistently achieve the same goal. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates uncertainties by conditional mutual information between transitions and returns. Discovering 'uncertainty accumulation' and 'temporal locality' properties of driving environments, we replace the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions. We also dynamically evaluate uncertainty at inference for cautious planning. Extensive experiments demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

TL;DR

An UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models, and replaces the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions.

Abstract

Offline Reinforcement Learning (RL) enables policy learning without active interactions, making it especially appealing for self-driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which, however, fails in stochastic environments with incorrect assumptions that identical actions can consistently achieve the same goal. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates uncertainties by conditional mutual information between transitions and returns. Discovering 'uncertainty accumulation' and 'temporal locality' properties of driving environments, we replace the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions. We also dynamically evaluate uncertainty at inference for cautious planning. Extensive experiments demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
Paper Structure (57 sections, 3 theorems, 24 equations, 11 figures, 10 tables)

This paper contains 57 sections, 3 theorems, 24 equations, 11 figures, 10 tables.

Key Result

Proposition 1

Assuming that the rewards obtained are determined by transitions $(s,a\rightarrow s')$ at each timestep and UNREST is perfectly trained to fit the expert demonstrations, then the discrepancy between target truncated returns and URNEST's rollout returns is bounded by a factor of environmental stochas

Figures (11)

  • Figure 1: Motivations of UNREST. (a): Example driving scenario where the variance of return increases when accounting for multiple tasks. (b): Calibration results of return distribution over future 1,000 steps are more uncertain than 100 steps. (c): Rollout returns/distances of sequences maximizing the return of future 100, 500, and 1,000 steps in the dataset are close to each other.
  • Figure 2: Overview of UNREST. Lower: Two return prediction transformers are trained for uncertainty estimation. The sequence is then segmented into certain (no background) and uncertain (orange background) parts w.r.t. estimated uncertainties, with 'certain parts' conditioned on returns to the next segmentation positions, and dummy tokens in 'uncertain parts'. Upper: The same architecture as DTs is used for action prediction, except that we add a return-span embedding to the truncated return embedding, and concatenate the global return embedding to the transformer output.
  • Figure 2: Ablation study results for UNREST on train town and train weather conditions.
  • Figure 3: Uncertainty-guided planning
  • Figure 4: UNREST performs well at failing cases of TT and SPLT. White rectangles are ego-vehicles.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Proposition 1: UNREST Alignment Bound
  • Theorem 1: UNREST Alignment Bound
  • Lemma 1: Determinism Equality