Table of Contents
Fetching ...

Planning under Distribution Shifts with Causal POMDPs

Matteo Ceriscioli, Karthika Mohan

TL;DR

This work proposes a theoretical framework for planning under partial observability using Partially Observable Markov Decision Processes (POMDPs) formulated using causal knowledge, and shows how to maintain and update a belief over both the latent state and the underlying domain.

Abstract

In the real world, planning is often challenged by distribution shifts. As such, a model of the environment obtained under one set of conditions may no longer remain valid as the distribution of states or the environment dynamics change, which in turn causes previously learned strategies to fail. In this work, we propose a theoretical framework for planning under partial observability using Partially Observable Markov Decision Processes (POMDPs) formulated using causal knowledge. By representing shifts in the environment as interventions on this causal POMDP, the framework enables evaluating plans under hypothesized changes and actively identifying which components of the environment have been altered. We show how to maintain and update a belief over both the latent state and the underlying domain, and we prove that the value function remains piecewise linear and convex (PWLC) in this augmented belief space. Preservation of PWLC under distribution shifts has the advantage of maintaining the tractability of planning via $α$-vector-based POMDP methods.

Planning under Distribution Shifts with Causal POMDPs

TL;DR

This work proposes a theoretical framework for planning under partial observability using Partially Observable Markov Decision Processes (POMDPs) formulated using causal knowledge, and shows how to maintain and update a belief over both the latent state and the underlying domain.

Abstract

In the real world, planning is often challenged by distribution shifts. As such, a model of the environment obtained under one set of conditions may no longer remain valid as the distribution of states or the environment dynamics change, which in turn causes previously learned strategies to fail. In this work, we propose a theoretical framework for planning under partial observability using Partially Observable Markov Decision Processes (POMDPs) formulated using causal knowledge. By representing shifts in the environment as interventions on this causal POMDP, the framework enables evaluating plans under hypothesized changes and actively identifying which components of the environment have been altered. We show how to maintain and update a belief over both the latent state and the underlying domain, and we prove that the value function remains piecewise linear and convex (PWLC) in this augmented belief space. Preservation of PWLC under distribution shifts has the advantage of maintaining the tractability of planning via -vector-based POMDP methods.
Paper Structure (10 sections, 8 theorems, 25 equations, 1 figure)

This paper contains 10 sections, 8 theorems, 25 equations, 1 figure.

Key Result

Proposition 1

Given a conditional distribution $P(X\mid pa(X))$ and an arbitrary target conditional distribution $P'(X\mid pa(X))$, it is possible to define a stochastic shift intervention $\sigma$ s.t. $P'(X\mid pa(X))=P(X\mid pa(X);\sigma)$.

Figures (1)

  • Figure 1: On the left is an example of a non–time-homogeneous causal POMDP. If the causal model is faithful, that is, every edge in the causal graph corresponds to an actual direct dependency between two variables, then any change in the graph structure across timesteps implies a change in the transition function. In the center is a CID that is compatible with a time-homogeneous causal POMDP. On the right is a compact representation of a time-homogeneous causal POMDP compatible with the CID shown in the center.

Theorems & Definitions (15)

  • Definition 1: Causal Influence Diagram Everitt_Carey_Langlois_Ortega_Legg_2021
  • Definition 2: Causal POMDP
  • Definition 3: Stochastic Shift Intervention
  • Proposition 1
  • Proposition 2
  • Lemma 1
  • Theorem 1
  • Proposition 1
  • proof
  • Proposition 2: State-Domain Joint Belief Update
  • ...and 5 more