Table of Contents
Fetching ...

On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems

Shoju Enami, Kenji Kashima

Abstract

In recent years, mutual information optimal control has been proposed as an extension of maximum entropy optimal control. Both approaches introduce regularization terms to render the policy stochastic, and it is important to theoretically clarify the relationship between the temperature parameter (i.e., the coefficient of the regularization term) and the stochasticity of the policy. Unlike in maximum entropy optimal control, this relationship remains unexplored in mutual information optimal control. In this paper, we investigate this relationship for a mutual information optimal control problem (MIOCP) of discrete-time linear systems. After extending the result of a previous study of the MIOCP, we establish the existence of an optimal policy of the MIOCP, and then derive the respective conditions on the temperature parameter under which the optimal policy becomes stochastic and deterministic. Furthermore, we also derive the respective conditions on the temperature parameter under which the policy obtained by an alternating optimization algorithm becomes stochastic and deterministic. The validity of the theoretical results is demonstrated through numerical experiments.

On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems

Abstract

In recent years, mutual information optimal control has been proposed as an extension of maximum entropy optimal control. Both approaches introduce regularization terms to render the policy stochastic, and it is important to theoretically clarify the relationship between the temperature parameter (i.e., the coefficient of the regularization term) and the stochasticity of the policy. Unlike in maximum entropy optimal control, this relationship remains unexplored in mutual information optimal control. In this paper, we investigate this relationship for a mutual information optimal control problem (MIOCP) of discrete-time linear systems. After extending the result of a previous study of the MIOCP, we establish the existence of an optimal policy of the MIOCP, and then derive the respective conditions on the temperature parameter under which the optimal policy becomes stochastic and deterministic. Furthermore, we also derive the respective conditions on the temperature parameter under which the policy obtained by an alternating optimization algorithm becomes stochastic and deterministic. The validity of the theoretical results is demonstrated through numerical experiments.

Paper Structure

This paper contains 29 sections, 100 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Rough sketch of how the optimal policy $\pi_{k}^{ME}$ (in maximum entropy optimal control) and the optimal policy $\pi_{k}^{MI}$ (in mutual information optimal control) relate to the temperature parameter $\varepsilon$.
  • Figure 2: The trajectories of $\Sigma_{\rho_{0}^{(i)}},\ldots, \Sigma_{\rho_{4}^{(i)}}$ for Problem \ref{['prob:MIOCP']} with $T=5$ and $\varepsilon = 10^{-3},10^{-1},10,$ and $10^{3}$.
  • Figure 6: The trajectories of $\Sigma_{\rho_{0}^{(i)}},\ldots, \Sigma_{\rho_{4}^{(i)}}$ for Problem \ref{['prob:MIOCP']} with $T=5$ and $\varepsilon = 10^{-3},10^{-1},10,$ and $10^{3}$.