Partially observed controlled Markov chains and optimal control of the Wonham filter
Fulvia Confortola, Marco Fuhrman
TL;DR
This work addresses optimal control of finite-state Markov chains under partial observation corrupted by Brownian noise. It constructs a controlled jump process with stochastic transition rates, then reframes the problem as a separated control problem driven by the Wonham filter via a Girsanov change of measure, yielding a fully observed dynamic on the filter state ρ. The paper proves that the separated problem is equivalent to the original problem and provides a comprehensive analysis: existence and properties of the Wonham-filter dynamics, viscosity-solution characterizations of the value function for both infinite and finite horizons with comparison and verification theorems, and a stochastic maximum principle that applies under broad conditions without convexity of the action set. Together, these results yield a robust framework for partially observed control in finite-state settings with practical implications for filtering-based control and decision-making under uncertainty.
Abstract
We consider a class of optimal control problems, with finite or infinite horizon, for a continuous-time Markov chain with finite state space. In this case, the control process affects the transition rates. We suppose that the controlled process can not be observed, and at any time the control actions are chosen based on the observation of a related stochastic process perturbed by an exogenous Brownian motion. We describe a construction of the controlled Markov chain, having stochastic transition rates adapted to the observation filtration. By a change of probability measure of Girsanov type, we introduce the so-called separated optimal control problem, where the state is the conditional (unnormalized) distribution of the controlled Markov chain and the observation process becomes a driving Brownian motion, and we prove the equivalence with the original control problem. The controlled equations for the separated problem are an instance of the Wonham filtering equations. Next we present an analysis of the separated problem: we characterize the value function as the unique viscosity solution to the dynamic programming equations (both in the parabolic and the elliptic case) we prove verifications theorems and a version of the stochastic maximum principle in the form of a necessary conditions for optimality.
