On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

Awni Altabaa; Zhuoran Yang

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

Awni Altabaa, Zhuoran Yang

TL;DR

This paper formalizes a novel reinforcement learning model which explicitly represents the information structure and uses this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure.

Abstract

In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we formalize a novel reinforcement learning model which explicitly represents the information structure. We then use this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

TL;DR

Abstract

Paper Structure (29 sections, 30 theorems, 159 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 30 theorems, 159 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Overview of contributions and technical challenges
Related Work
Notation
Generic Sequential Decision Making Problems and Generalized PSRs
Generic Sequential Decision-Making Problems
Generalized Predictive State Representations
Information Structure
Partially-Observable Sequential Teams
Partially-Observable Sequential Games
Information Structure Determines the Rank of POSTs/POSGs
Examples of Information Structures and their Rank
Constructing a PSR parameterization for POSTs and POSGs
Core test sets for POSTs/POSGs
Generalized PSR parameterization of POST/POSG
...and 14 more sections

Key Result

Proposition 1

Let $(X_1, \ldots, X_H)$ be any sequential decision-making problem with observation index set ${\mathcal{O}}$, action index set ${\mathcal{A}}$, and variable spaces $\left\{{\mathbb{X}}_h\right\}_{h \in [H]}$. Let $r_h = \mathrm{rank}(\bm{D}_h)$, where $D_h, h \in [H]$ are the system dynamics matric

Figures (5)

Figure 1: A depiction of the generality of our proposed models. POSTs and POSGs capture MDPs, POMDPs, Dec-POMDPs, and POMGs as special cases.
Figure 2: A depiction of the information structure of a (2-agent) Dec-POMDP/POMG within the POST/POSG framework. Blue nodes indicate past observables, green nodes indicate future observables, and orange nodes indicate the information structural state ${\mathcal{I}}^\dagger$. This shows that in the case of Dec-POMDPs/POMGs, the information structural state recovers the latent Markovian state.
Figure 3: An illustrative example of the information-structural state for POMDPs. Left. The DAG representation of the information structure ${\mathcal{G}}$. Right. The DAG ${\mathcal{G}}^\dagger$ is depicted by drawing the edges corresponding to the information sets of the action variables with dotted lines. The information-structural state coincides with the Markovian state $s_t$, and is depicted in red. Future observables are drawn in green, and past observables are drawn in blue.
Figure 4: DAG representation of various information structures. Solid edges indicate the edges in ${\mathcal{E}}^\dagger$ and light edges indicate the information sets of action variables. Grey nodes represent unobservable variables, blue nodes represent past observable variables, green nodes represent future observable variables, and red nodes represent the information structural state ${\mathcal{I}}_h^\dagger$. To find ${\mathcal{I}}_h^\dagger$, as per \ref{['theorem:post_posg_rank']}, we first remove the incoming edges into the action variables, then we find the minimal set among all past variables (both observable and unobservable) which $d$-separates the past observations from the future observations.
Figure 5: A depiction of the construction of a generalized predictive state representation for POST/POSG models.

Theorems & Definitions (68)

Definition 1: Rank of dynamics
Definition 2: Core test sets
Definition 3: Generalized Predictive State Representations
Remark 1: Generality and difference from standard PSRs
Proposition 1
proof
Definition 4: Partially-Observable Sequential Team Model
Definition 5: Partially-Observable Sequential Game Model
Definition 6: Best response
Definition 7: Nash Equilibrium
...and 58 more

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

TL;DR

Abstract

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (68)