Table of Contents
Fetching ...

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

Philip Amortila, Dylan J. Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi

TL;DR

The main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations, and a positive result is identified, identifying latent pushforward coverability as a general condition that enables statistical tractability.

Abstract

Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are poorly understood. This paper addresses the question of reinforcement learning under $\textit{general}$ latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions -- that is, reductions that transform an arbitrary algorithm for the latent MDP into an algorithm that can operate on rich observations -- in two settings: one where the agent has access to hindsight observations of the latent dynamics [LADZ23], and one where the agent can estimate self-predictive latent models [SAGHCB20]. Together, our results serve as a first step toward a unified statistical and algorithmic theory for reinforcement learning under latent dynamics.

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

TL;DR

The main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations, and a positive result is identified, identifying latent pushforward coverability as a general condition that enables statistical tractability.

Abstract

Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are poorly understood. This paper addresses the question of reinforcement learning under latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions -- that is, reductions that transform an arbitrary algorithm for the latent MDP into an algorithm that can operate on rich observations -- in two settings: one where the agent has access to hindsight observations of the latent dynamics [LADZ23], and one where the agent can estimate self-predictive latent models [SAGHCB20]. Together, our results serve as a first step toward a unified statistical and algorithmic theory for reinforcement learning under latent dynamics.

Paper Structure

This paper contains 84 sections, 60 theorems, 307 equations, 1 figure, 5 algorithms.

Key Result

Theorem 1

For every $N \geq 4$, there exists a decoder class $\Phi$ with $|\Phi| = N$ and a family of base MDPs $\mathcal{M}_\texttt{lat}$ satisfying (i) $|\mathcal{M}_\texttt{lat}|=1$, (ii) $H \leq \mathcal{O}(\log(N))$, (iii) $|\mathcal{S}| = |\mathcal{X}| \leq N^2$, (iv) $|\mathcal{A}| = 2$, and such that

Figures (1)

  • Figure 1: Summary of statistical modularity (SM) results. ✓: SM is possible for a natural choice of $\texttt{comp}(\cdot)$ (e.g., $\mathop{\mathrm{\texttt{poly}}}\nolimits(\undefined{\mathcal{S}},\undefined{\mathcal{A}},H,\varepsilon^{-1},\log(\delta^{-1}))$ for tabular MDPs). ✗: SM is not possible with natural choices of $\texttt{comp}(\cdot)$. ?: open. $^\star$: SM is possible if willing to pay for (suboptimal) $\undefined{\mathcal{A}}$ complexity. See \ref{['sec:filling-out-lb-table']} for precise descriptions of each setting and our choices for their complexities.

Theorems & Definitions (75)

  • Definition 2.1: Emission process
  • Definition 2.2: Latent-dynamics MDP
  • Definition 2.3: Latent-dynamics MDP class
  • Definition 3.1: Statistical complexity
  • Definition 3.2: Statistical modularity
  • Theorem 1: Impossibility of statistical modularity
  • Definition 3.3: Pushforward coverability
  • Theorem 2: Pushforward-coverable MDPs are statistically modular
  • Definition 3.4: Mismatch functions
  • Lemma 1: MDPS with pushforward coverability admit low-dimensional embeddings
  • ...and 65 more