Table of Contents
Fetching ...

Deep Belief Markov Models for POMDP Inference

Giacomo Arcieri, Konstantinos G. Papakonstantinou, Daniel Straub, Eleni Chatzi

TL;DR

The paper addresses efficient belief inference and planning under partial observability in POMDPs with high-dimensional state spaces. It introduces the Deep Belief Markov Model (DBMM), a neural, variational extension of Deep Markov Models that explicitly models belief propagation under actions while respecting the POMDP structure. DBMM learns a belief-transition operator and an observation model, enabling generative simulation, belief inference for RL, and planning with quantified uncertainty, without requiring a ground-truth model. Across discrete, continuous, and real-world benchmarks (including railway maintenance), DBMM learns interpretable, well-calibrated beliefs that can surpass raw observations and compete with EnKF under known models, highlighting its potential for scalable POMDP inference and decision-making.

Abstract

This work introduces a novel deep learning-based architecture, termed the Deep Belief Markov Model (DBMM), which provides efficient, model-formulation agnostic inference in Partially Observable Markov Decision Process (POMDP) problems. The POMDP framework allows for modeling and solving sequential decision-making problems under observation uncertainty. In complex, high-dimensional, partially observable environments, existing methods for inference based on exact computations (e.g., via Bayes' theorem) or sampling algorithms do not scale well. Furthermore, ground truth states may not be available for learning the exact transition dynamics. DBMMs extend deep Markov models into the partially observable decision-making framework and allow efficient belief inference entirely based on available observation data via variational inference methods. By leveraging the potency of neural networks, DBMMs can infer and simulate non-linear relationships in the system dynamics and naturally scale to problems with high dimensionality and discrete or continuous variables. In addition, neural network parameters can be dynamically updated efficiently based on data availability. DBMMs can thus be used to infer a belief variable, thus enabling the derivation of POMDP solutions over the belief space. We evaluate the efficacy of the proposed methodology by evaluating the capability of model-formulation agnostic inference of DBMMs in benchmark problems that include discrete and continuous variables.

Deep Belief Markov Models for POMDP Inference

TL;DR

The paper addresses efficient belief inference and planning under partial observability in POMDPs with high-dimensional state spaces. It introduces the Deep Belief Markov Model (DBMM), a neural, variational extension of Deep Markov Models that explicitly models belief propagation under actions while respecting the POMDP structure. DBMM learns a belief-transition operator and an observation model, enabling generative simulation, belief inference for RL, and planning with quantified uncertainty, without requiring a ground-truth model. Across discrete, continuous, and real-world benchmarks (including railway maintenance), DBMM learns interpretable, well-calibrated beliefs that can surpass raw observations and compete with EnKF under known models, highlighting its potential for scalable POMDP inference and decision-making.

Abstract

This work introduces a novel deep learning-based architecture, termed the Deep Belief Markov Model (DBMM), which provides efficient, model-formulation agnostic inference in Partially Observable Markov Decision Process (POMDP) problems. The POMDP framework allows for modeling and solving sequential decision-making problems under observation uncertainty. In complex, high-dimensional, partially observable environments, existing methods for inference based on exact computations (e.g., via Bayes' theorem) or sampling algorithms do not scale well. Furthermore, ground truth states may not be available for learning the exact transition dynamics. DBMMs extend deep Markov models into the partially observable decision-making framework and allow efficient belief inference entirely based on available observation data via variational inference methods. By leveraging the potency of neural networks, DBMMs can infer and simulate non-linear relationships in the system dynamics and naturally scale to problems with high dimensionality and discrete or continuous variables. In addition, neural network parameters can be dynamically updated efficiently based on data availability. DBMMs can thus be used to infer a belief variable, thus enabling the derivation of POMDP solutions over the belief space. We evaluate the efficacy of the proposed methodology by evaluating the capability of model-formulation agnostic inference of DBMMs in benchmark problems that include discrete and continuous variables.

Paper Structure

This paper contains 15 sections, 34 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Probabilistic graphical model of a POMDP.
  • Figure 2: Graphical representation of the Deep Markov Model. The generative model (left) is composed of two neural networks that learn the transition and the observation models. The inference model (right) is composed of a RNN that learns recurrent hidden states from (future) observations and a second neural network (termed Combiner) that infers the systems hidden states.
  • Figure 3: Graphical representation of the Deep Belief Markov Model (DBMM). The generative model (left) is composed of two neural networks that learn the belief transition operator and the observation model. The inference model (right) is composed of two neural networks that learn the belief transition and the belief inference operators.
  • Figure 4: CE loss between the true beliefs and the hidden states (red) and between the predicted beliefs and the hidden states (black) over the evaluation loops in the discrete case (at each evaluation loop the two values of CE loss are computed over the same 500 trials, afterwards the model is updated on these trials).
  • Figure 5: MSE between the observations and the hidden states (red), MSE between the DBMM (mean) beliefs and the hidden states (black), and MSE between the EnKF (mean) beliefs and the hidden states (green) on the continuous benchmark.
  • ...and 3 more figures