Table of Contents
Fetching ...

Entropy Regularization as Robustness under Bayesian Drift Uncertainty

Andy Au

Abstract

We study entropy-regularized mean-variance portfolio optimization under Bayesian drift uncertainty. Gaussian policies remain optimal under partial information, the value function is quadratic in wealth, and belief-dependent coefficients admit closed-form solutions. The mean control is identical to deterministic Bayesian Markowitz feedback; entropy regularization affects only the policy variance. Additionally, this variance does not affect information gain, and instead provides belief-dependent robustness. Notably, optimal policy variance increases with posterior conviction $|m_t|$, forcing greater action randomization when mean position is most aggressive.

Entropy Regularization as Robustness under Bayesian Drift Uncertainty

Abstract

We study entropy-regularized mean-variance portfolio optimization under Bayesian drift uncertainty. Gaussian policies remain optimal under partial information, the value function is quadratic in wealth, and belief-dependent coefficients admit closed-form solutions. The mean control is identical to deterministic Bayesian Markowitz feedback; entropy regularization affects only the policy variance. Additionally, this variance does not affect information gain, and instead provides belief-dependent robustness. Notably, optimal policy variance increases with posterior conviction , forcing greater action randomization when mean position is most aggressive.
Paper Structure (28 sections, 8 theorems, 33 equations, 2 figures)

This paper contains 28 sections, 8 theorems, 33 equations, 2 figures.

Key Result

Proposition 2.2

The posterior remains Gaussian: $\rho \mid \mathcal{F}_t \sim \mathcal{N}(m_t, P_t)$, with dynamics where the innovation $d\widehat{W}_t := dY_t - m_t\,dt$ is a Brownian motion in $\mathcal{F}_t$. The posterior variance has closed-form $P_t = \frac{P_0}{1 + P_0 t}$.

Figures (2)

  • Figure 1: (a) Sample posterior mean paths $m_t$ under Kalman--Bucy filtering. (b) Optimal policy variance $\varsigma^{*2}(t,m_t)$ along each path. Stronger conviction (larger $|m_t|$) produces greater policy randomization. Parameters: $P_0 = 1$, $T = 1$, $\tau = 1$, $\sigma = 0.2$, $\rho = 1$.
  • Figure 2: Heatmap of optimal policy variance $\varsigma^{*2}(t,m)$ over the $(t,m)$ plane. Contour lines are in white. The variance is symmetric in $m$, minimized along $m=0$ (dashed line), increasing with $|m|$ at each fixed $t < T$, and collapsing to a common value at $T$.

Theorems & Definitions (19)

  • Remark 1.1: Role of Entropy Regularization
  • Proposition 2.2: Posterior dynamics
  • Definition 2.3: Admissible policy
  • Lemma 3.1: Optimal policy is Gaussian
  • proof
  • Remark 3.2: Mean-variance separation
  • Proposition 4.1: Polynomial impossibility
  • Proposition 4.2: Coefficient system
  • Proposition 4.3: $w$-separation
  • Proposition 4.4: Closed-form solution
  • ...and 9 more