Table of Contents
Fetching ...

Exploratory Utility Maximization Problem with Tsallis Entropy

Chen Ziyi, Gu Jia-wen

TL;DR

It is found that the utility maximization exploratory problem is ill-posed in certain cases, due to over-exploration, and the means of the two optimal exploratory policies coincide with that of the classical counterpart.

Abstract

We study expected utility maximization problem with constant relative risk aversion utility function in a complete market under the reinforcement learning framework. To induce exploration, we introduce the Tsallis entropy regularizer, which generalizes the commonly used Shannon entropy. Unlike the classical Merton's problem, which is always well-posed and admits closed-form solutions, we find that the utility maximization exploratory problem is ill-posed in certain cases, due to over-exploration. With a carefully selected primary temperature function, we investigate two specific examples, for which we fully characterize their well-posedness and provide semi-closed-form solutions. It is interesting to find that one example has the well-known Gaussian distribution as the optimal strategy, while the other features the rare Wigner semicircle distribution, which is equivalent to a scaled Beta distribution. The means of the two optimal exploratory policies coincide with that of the classical counterpart. In addition, we examine the convergence of the value function and optimal exploratory strategy as the exploration vanishes. Finally, we design a reinforcement learning algorithm and conduct numerical experiments to demonstrate the advantages of reinforcement learning.

Exploratory Utility Maximization Problem with Tsallis Entropy

TL;DR

It is found that the utility maximization exploratory problem is ill-posed in certain cases, due to over-exploration, and the means of the two optimal exploratory policies coincide with that of the classical counterpart.

Abstract

We study expected utility maximization problem with constant relative risk aversion utility function in a complete market under the reinforcement learning framework. To induce exploration, we introduce the Tsallis entropy regularizer, which generalizes the commonly used Shannon entropy. Unlike the classical Merton's problem, which is always well-posed and admits closed-form solutions, we find that the utility maximization exploratory problem is ill-posed in certain cases, due to over-exploration. With a carefully selected primary temperature function, we investigate two specific examples, for which we fully characterize their well-posedness and provide semi-closed-form solutions. It is interesting to find that one example has the well-known Gaussian distribution as the optimal strategy, while the other features the rare Wigner semicircle distribution, which is equivalent to a scaled Beta distribution. The means of the two optimal exploratory policies coincide with that of the classical counterpart. In addition, we examine the convergence of the value function and optimal exploratory strategy as the exploration vanishes. Finally, we design a reinforcement learning algorithm and conduct numerical experiments to demonstrate the advantages of reinforcement learning.

Paper Structure

This paper contains 21 sections, 12 theorems, 94 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $\lambda(t,w)=\lambda(t)$ be a positive continuous-time-dependent primary temperature function and $\beta=1$. The control problem $(pb_ex)$ is well-posed for $p \leq 0$ but becomes ill-posed for $0<p<1$. Moreover, when $p=0$, a closed-form optimal strategy given by and the correspondingly optimal value function is

Figures (5)

  • Figure 1: Numerical Solutions of the ODE under Different Parameter Selections when $b<0$
  • Figure 2: Numerical Solutions of the ODE under Different Parameter Selections when $b>0$
  • Figure 3: dynamics of $\varphi$
  • Figure 4: approximation of $f$
  • Figure 5: Performance under different $\gamma$

Theorems & Definitions (26)

  • Definition 1
  • Proposition 2.1
  • proof
  • Remark 1
  • Proposition 3.1
  • Lemma 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • ...and 16 more