Policy Evaluation in Distributional LQR (Extended Version)

Zifan Wang; Yulong Gao; Siyi Wang; Michael M. Zavlanos; Alessandro Abate; Karl H. Johansson

Policy Evaluation in Distributional LQR (Extended Version)

Zifan Wang, Yulong Gao, Siyi Wang, Michael M. Zavlanos, Alessandro Abate, Karl H. Johansson

TL;DR

This article provides a closed-form expression for the distribution of the random return, which is applicable for all types of exogenous disturbance as long as it is independent and identically distributed, and investigates the sensitivity of the return distribution to model perturbations.

Abstract

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard reinforcement learning. Meanwhile, a challenge in DRL is that the policy evaluation typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for the special class of DRL problems that rely on a discounted linear quadratic regulator (LQR), which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression for the distribution of the random return, which is applicable for all types of exogenous disturbance as long as it is independent and identically distributed (i.i.d.). We show that the variance of the random return is bounded if the fourth moment of the exogenous disturbance is bounded. Furthermore, we investigate the sensitivity of the return distribution to model perturbations. While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be well approximated by a finite number of random variables. The associated approximation error can be analytically bounded under mild assumptions. When the model is unknown, we propose a model-free approach for estimating the return distribution, supported by sample complexity guarantees. Finally, we extend our approach to partially observable linear systems. Numerical experiments are provided to illustrate the theoretical results.

Policy Evaluation in Distributional LQR (Extended Version)

TL;DR

Abstract

Paper Structure (23 sections, 11 theorems, 77 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 11 theorems, 77 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Contributions
Organisation and Notations
Problem Statement
Classical Discounted LQR
Distributional LQR
Main Results on Distributional LQR
Characterisation of the Random Return
Bounded Variance of the Random Return
Sensitivity Analysis of the Return Distribution
Model-Based Approximation of the Return Distribution
Model-Free Approximation of the Return Distribution
Extension to Partially Observable Systems
Experiments
...and 8 more sections

Key Result

Theorem 1

Suppose that the feedback gain $K$ is stabilizing, i.e., $A_K=A+BK$ is stable. Let where $P$ is obtained from the Lyapunov equation $P = Q+ K^{\rm{T}} R K + \gamma A_K^{\rm{T}} P A_K$, and the random variables $w_k \sim \mathcal{D}$ are independent from each other for all $k\in\mathbb{N}$. Then, the random variable $G^{K}(x)$ defined in eq:dist_func is a fixed point solution to th

Figures (4)

Figure 1: The PDFs of three types of disturbance and of their corresponding random costs in LQR. The PDFs of the random costs are generated by Algorithm \ref{['alg:algorithm_MFPE']} in this paper.
Figure 2: Return distribution and its approximation with finite number of random variables for different values of $\gamma$ and $x_0$ in LQR. Alg. 1 denotes the distribution returned by Algorithm \ref{['alg:algorithm_MFPE']} and $f_N$ denotes the distribution of the approximated random return $G^K_N(x_0)$.
Figure 3: Original and perturbed return distributions for different values of $\gamma$, $\epsilon_A$ and $\epsilon_B$ in LQR.
Figure 4: Return distribution and its approximation with finite number of random variables for different values of $\gamma$ in LQG. MC denotes the distribution estimated using the Monte Carlo method and $f_N$ denotes the distribution of the approximated random return $G^{KL}_N(\bar{x})$.

Theorems & Definitions (16)

Example 1
Theorem 1
Remark 1
Theorem 2
Theorem 3
Theorem 4
Remark 2
Theorem 5
Remark 3
Corollary 1
...and 6 more

Policy Evaluation in Distributional LQR (Extended Version)

TL;DR

Abstract

Policy Evaluation in Distributional LQR (Extended Version)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (16)