What type of inference is planning?

Miguel Lázaro-Gredilla; Li Yang Ku; Kevin P. Murphy; Dileep George

What type of inference is planning?

Miguel Lázaro-Gredilla, Li Yang Ku, Kevin P. Murphy, Dileep George

TL;DR

The paper reframes planning as a distinct inference problem within a variational framework, showing planning corresponds to a unique entropy weighting that differentiates it from marginal, MAP, and MMAP in probabilistic graphical models. It develops a planning VI and an analogue of loopy belief propagation (VBP) to perform approximate planning in factored MDPs, while also presenting a linear-programming perspective and determinization-based bounds. Through theoretical analysis and extensive experiments on synthetic MDPs and IPPC tasks, the work demonstrates that planning-focused inference can outperform traditional approaches under moderate to high stochasticity and highlight the role of reactivity in planning. Overall, the variational viewpoint unifies planning methods, explains their performance tradeoffs, and yields practical, scalable algorithms with empirical validation.

Abstract

Multiple types of inference are available for probabilistic graphical models, e.g., marginal, maximum-a-posteriori, and even marginal maximum-a-posteriori. Which one do researchers mean when they talk about "planning as inference"? There is no consistency in the literature, different types are used, and their ability to do planning is further entangled with specific approximations or additional constraints. In this work we use the variational framework to show that, just like all commonly used types of inference correspond to different weightings of the entropy terms in the variational problem, planning corresponds exactly to a different set of weights. This means that all the tricks of variational inference are readily applicable to planning. We develop an analogue of loopy belief propagation that allows us to perform approximate planning in factored-state Markov decisions processes without incurring intractability due to the exponentially large state space. The variational perspective shows that the previous types of inference for planning are only adequate in environments with low stochasticity, and allows us to characterize each type by its own merits, disentangling the type of inference from the additional approximations that its practical use requires. We validate these results empirically on synthetic MDPs and tasks posed in the International Planning Competition.

What type of inference is planning?

TL;DR

Abstract

Paper Structure (46 sections, 3 theorems, 65 equations, 5 figures, 2 tables)

This paper contains 46 sections, 3 theorems, 65 equations, 5 figures, 2 tables.

Introduction
Background
Markov Decision processes (MDPs) and notation
Variational inference
Methods
VI for standard MDPs
VI LP and VBP for factored MDPs
VBP for standard MDPs
Determinization in hindsight
The different types of inference and their adequacy for planning
Ranking inference types for planning
The stochasticity of the dynamics is key
Related work
Empirical validation
Synthetic MDPs
...and 31 more sections

Key Result

Theorem 1

Given known dynamics $P(x_{t+1}|a_t, x_t)$, an initial distribution $P(x_1)$ and reward functions $R_t(x_t, a_t, x_{t+1})$, the best exponential utility $F^\text{planning} _\lambda$ from Eq. eq:beu can be expressed as the result of a concave variational optimization problem with energy $E_\lambda(\bm{q})$ and entropy $H^\text{planning}(\bm{q})$ terms where ${\bm q}\equiv q({\bm x}, {\bm a})$ is

Figures (5)

Figure 1: Factor graphs: [Left] Standard MDP [Right] Factored MDP with sparse factor connectivity.
Figure 2: Performance of different types of inference on factored MDPs as a function of their level of stochasticity (normalized entropy). [Left] Estimation error of the best utility. Lower is better. [Right] Advantage of the next action prescribed by a method vs. optimal planning. Higher is better.
Figure 3: Cumulative rewards on 6 problem domains from the ICAPS 2011 IPPC. A small horizontal jitter was introduced in all data points for visual clarity. Each cumulative reward is averaged over 30 simulations per instance. Datasets are ordered from left to right and top to bottom by increasing normalized entropy levels. Only the last two have a significant stochasticity level >5%.
Figure 4: Correspondence between the message passing updates and the factorized MDP.
Figure 5: Cumulative rewards on 6 problem domains from the ICAPS 2011 IPPC. A small horizontal jitter was introduced in all data points for visual clarity. Each cumulative reward is averaged over 30 simulations per instance. Datasets are ordered from left to right and top to bottom by increasing normalized entropy levels. Only the last two have a significant stochasticity level >5%.

Theorems & Definitions (3)

Theorem 1: Variational formulation of planning
Corollary 1.1: Additive limit
Lemma : Additive limit for factored MDPs

What type of inference is planning?

TL;DR

Abstract

What type of inference is planning?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)