DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

Yupeng Wu; Wenyun Li; Wenjie Huang; Chin Pang Ho

DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

Yupeng Wu, Wenyun Li, Wenjie Huang, Chin Pang Ho

TL;DR

This work proposes a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA), which quantifies both epistemic and implicit aleatory uncertainties in a unified manner and dynamically adjusts the epistemic risk levels by solving a total variation minimization problem online.

Abstract

One of the main challenges in reinforcement learning (RL) is that the agent has to make decisions that would influence the future performance without having complete knowledge of the environment. Dynamically adjusting the level of epistemic risk during the learning process can help to achieve reliable policies in safety-critical settings with better efficiency. In this work, we propose a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA). This framework quantifies both epistemic and implicit aleatory uncertainties in a unified manner and dynamically adjusts the epistemic risk levels by solving a total variation minimization problem online. The framework unifies the existing variants of risk adaption approaches and offers better explainability and flexibility. The selection of risk levels is performed efficiently via a grid search using a Follow-The-Leader-type algorithm, where the offline oracle also corresponds to a ''satisficing measure'' under a specially modified loss function. We show that DRL-ORA outperforms existing methods that rely on fixed risk levels or manually designed risk level adaptation in multiple classes of tasks.

DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

TL;DR

Abstract

Paper Structure (20 sections, 8 theorems, 46 equations, 8 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 8 theorems, 46 equations, 8 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Distributional Reinforcement Learning (DRL)
Adaptive Risk-awareness (Risk Tendency)
Methodology: DRL-ORA
A Generalized Non-convex Learning Perspective
Ensemble Networks for Epistemic Uncertainty Quantification
Regret Minimization and Analysis
Extension: Relation to Satisficing Measure
Experiments and Applications
Atari Games
Nano Drone Navigation
Knapsack
Further Remarks
...and 5 more sections

Key Result

Theorem 4

For an arbitrarily small $\epsilon>0$, the set $\mathcal{A}$ can be properly discretized as $\mathcal{A}^{\prime}$, such that the Hausdorff distance between the two sets, i.e., In addition, by choosing $\epsilon = O(T^{-1/2})$, Algorithm 1 can achieve $O(T^{1/2})$ expected regret complexity.

Figures (8)

Figure 1: "IQN alpha:1" represents that $\alpha$ is fixed at $0.1$ throughout all episodes. "IQN alpha:191" means that $\alpha$ is manually adjusted over the episodes, linearly increasing from $0.1$ to $0.9$ and then linearly decreasing back to $0.1$. Same interpretation applies to the other settings.
Figure 2: Average episodic scores in Nano Drone navigation task. The shaded area represents a 90% confidence interval.
Figure 3: Testing results with 90% confidence interval. "Comp." means Composite IQN. The "Optimal episodic reward" is the benchmark solved via DP.
Figure 4: Reward lines on Knapsack.
Figure 5: Graphic Illustration of Problem (\ref{['Transmin']})
...and 3 more figures

Theorems & Definitions (16)

Example 1
Example 2: acerbi2002coherence
Example 3: dhaene2012remarks
Theorem 4
Theorem 5
Theorem 6
Lemma 7
proof
Lemma 8
proof
...and 6 more

DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

TL;DR

Abstract

DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (16)