Table of Contents
Fetching ...

Approximate Control for Continuous-Time POMDPs

Yannick Eich, Bastian Alt, Heinz Koeppl

TL;DR

This work addresses control under partial observability in continuous time with discrete states by decoupling filtering and control. It introduces entropic matching to obtain a low-dimensional, parametric belief evolution and a QMDP-inspired control policy that scales to large state spaces. The approach is demonstrated on queueing networks, predator-prey CRNs, and closed-loop CRNs, with findings showing competitive performance against particle filters and strong qualitative improvements in stability and balancing. The results indicate that scalable CT-POMDP control is feasible and potentially impactful for complex stochastic systems where exact filtering and optimal control are intractable.

Abstract

This work proposes a decision-making framework for partially observable systems in continuous time with discrete state and action spaces. As optimal decision-making becomes intractable for large state spaces we employ approximation methods for the filtering and the control problem that scale well with an increasing number of states. Specifically, we approximate the high-dimensional filtering distribution by projecting it onto a parametric family of distributions, and integrate it into a control heuristic based on the fully observable system to obtain a scalable policy. We demonstrate the effectiveness of our approach on several partially observed systems, including queueing systems and chemical reaction networks.

Approximate Control for Continuous-Time POMDPs

TL;DR

This work addresses control under partial observability in continuous time with discrete states by decoupling filtering and control. It introduces entropic matching to obtain a low-dimensional, parametric belief evolution and a QMDP-inspired control policy that scales to large state spaces. The approach is demonstrated on queueing networks, predator-prey CRNs, and closed-loop CRNs, with findings showing competitive performance against particle filters and strong qualitative improvements in stability and balancing. The results indicate that scalable CT-POMDP control is feasible and potentially impactful for complex stochastic systems where exact filtering and optimal control are intractable.

Abstract

This work proposes a decision-making framework for partially observable systems in continuous time with discrete state and action spaces. As optimal decision-making becomes intractable for large state spaces we employ approximation methods for the filtering and the control problem that scale well with an increasing number of states. Specifically, we approximate the high-dimensional filtering distribution by projecting it onto a parametric family of distributions, and integrate it into a control heuristic based on the fully observable system to obtain a scalable policy. We demonstrate the effectiveness of our approach on several partially observed systems, including queueing systems and chemical reaction networks.
Paper Structure (27 sections, 101 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 101 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: A schematic description of the considered queueing problem. The decision-maker decides which queue outputs its packets to the third queue.
  • Figure 2: A sample trajectory of the queueing problem using a policy computed by the QMDP method. The upper plots compare the projection filter to a particle filter by indicating their mean and variance. The lower plot describes the actions over time.
  • Figure 3: Advantage function for the LV problem. The upper and the lower plot show the advantage function over a section of the belief space for the first and the second action, respectively.
  • Figure 4: A sample trajectory of the LV problem using a policy computed by the QMPD method. The upper plots show the evolution of the exact states and the projection filter by indicating its mean and variance. The lower plot describes the actions over time.
  • Figure 5: A schematic description of the considered crn. Species $\mathsf{X_3}$ and $\mathsf{X_4}$ get observed exactly and are used to build an estimate of $\mathsf{X_1}$ and $\mathsf{X_2}$. The decision-maker can influence the flow between species $\mathsf{X_1}$ and $\mathsf{X_2}$.
  • ...and 3 more figures