Optimal Transport-Assisted Risk-Sensitive Q-Learning

Zahra Shahrooei; Ali Baheri

Optimal Transport-Assisted Risk-Sensitive Q-Learning

Zahra Shahrooei, Ali Baheri

TL;DR

A risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety and achieves faster convergence to a stable policy compared to the traditional Q-learning algorithm is presented.

Abstract

The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the traditional Q-learning algorithm.

Optimal Transport-Assisted Risk-Sensitive Q-Learning

TL;DR

Abstract

Paper Structure (8 sections, 5 equations, 4 figures, 1 algorithm)

This paper contains 8 sections, 5 equations, 4 figures, 1 algorithm.

Introduction
Preliminaries
Markov Decision Processes
Q-learning Algorithm
Optimal Transport Theory
Risk-sensitive Reinforcement Learning with Optimal Transport
Simulations and Results
Conclusions

Figures (4)

Figure 1: Gridworld environment
Figure 2: Average return values across $500$ episodes for $5$ random seeds.
Figure 3: Average episode length for $5$ random seeds.
Figure 4: Number of obstacle collisions over $500$ episodes for $5$ random seeds.

Optimal Transport-Assisted Risk-Sensitive Q-Learning

TL;DR

Abstract

Optimal Transport-Assisted Risk-Sensitive Q-Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)