Table of Contents
Fetching ...

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

Awni Altabaa, Zhuoran Yang

TL;DR

This paper formalizes a novel reinforcement learning model which explicitly represents the information structure and uses this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure.

Abstract

In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we formalize a novel reinforcement learning model which explicitly represents the information structure. We then use this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

TL;DR

This paper formalizes a novel reinforcement learning model which explicitly represents the information structure and uses this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure.

Abstract

In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we formalize a novel reinforcement learning model which explicitly represents the information structure. We then use this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.
Paper Structure (29 sections, 30 theorems, 159 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 30 theorems, 159 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Let $(X_1, \ldots, X_H)$ be any sequential decision-making problem with observation index set ${\mathcal{O}}$, action index set ${\mathcal{A}}$, and variable spaces $\left\{{\mathbb{X}}_h\right\}_{h \in [H]}$. Let $r_h = \mathrm{rank}(\bm{D}_h)$, where $D_h, h \in [H]$ are the system dynamics matric

Figures (5)

  • Figure 1: A depiction of the generality of our proposed models. POSTs and POSGs capture MDPs, POMDPs, Dec-POMDPs, and POMGs as special cases.
  • Figure 2: A depiction of the information structure of a (2-agent) Dec-POMDP/POMG within the POST/POSG framework. Blue nodes indicate past observables, green nodes indicate future observables, and orange nodes indicate the information structural state ${\mathcal{I}}^\dagger$. This shows that in the case of Dec-POMDPs/POMGs, the information structural state recovers the latent Markovian state.
  • Figure 3: An illustrative example of the information-structural state for POMDPs. Left. The DAG representation of the information structure ${\mathcal{G}}$. Right. The DAG ${\mathcal{G}}^\dagger$ is depicted by drawing the edges corresponding to the information sets of the action variables with dotted lines. The information-structural state coincides with the Markovian state $s_t$, and is depicted in red. Future observables are drawn in green, and past observables are drawn in blue.
  • Figure 4: DAG representation of various information structures. Solid edges indicate the edges in ${\mathcal{E}}^\dagger$ and light edges indicate the information sets of action variables. Grey nodes represent unobservable variables, blue nodes represent past observable variables, green nodes represent future observable variables, and red nodes represent the information structural state ${\mathcal{I}}_h^\dagger$. To find ${\mathcal{I}}_h^\dagger$, as per \ref{['theorem:post_posg_rank']}, we first remove the incoming edges into the action variables, then we find the minimal set among all past variables (both observable and unobservable) which $d$-separates the past observations from the future observations.
  • Figure 5: A depiction of the construction of a generalized predictive state representation for POST/POSG models.

Theorems & Definitions (68)

  • Definition 1: Rank of dynamics
  • Definition 2: Core test sets
  • Definition 3: Generalized Predictive State Representations
  • Remark 1: Generality and difference from standard PSRs
  • Proposition 1
  • proof
  • Definition 4: Partially-Observable Sequential Team Model
  • Definition 5: Partially-Observable Sequential Game Model
  • Definition 6: Best response
  • Definition 7: Nash Equilibrium
  • ...and 58 more