Table of Contents
Fetching ...

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

Tommaso Giorgi, Pierriccardo Olivieri, Keyue Jiang, Laura Toni, Matteo Papini

TL;DR

This work proves an upper bound on the approximation error of linear value function approximation under the learned spectral features of the Markov Decision Processes, holding for general (non-uniform) policies without any assumptions on the symmetry of the induced transition kernel.

Abstract

Learning compact state representations in Markov Decision Processes (MDPs) has proven crucial for addressing the curse of dimensionality in large-scale reinforcement learning (RL) problems. Existing principled approaches leverage structural priors on the MDP by constructing state representations as linear combinations of the state-graph Laplacian eigenvectors. When the transition graph is unknown or the state space is prohibitively large, the graph spectral features can be estimated directly via sample trajectories. In this work, we prove an upper bound on the approximation error of linear value function approximation under the learned spectral features. We show how this error scales with the algebraic connectivity of the state-graph, grounding the approximation quality in the topological structure of the MDP. We further bound the error introduced by the eigenvector estimation itself, leading to an end-to-end error decomposition across the representation learning pipeline. Additionally, our expression of the Laplacian operator for the RL setting, although equivalent to existing ones, prevents some common misunderstandings, of which we show some examples from the literature. Our results hold for general (non-uniform) policies without any assumptions on the symmetry of the induced transition kernel. We validate our theoretical findings with numerical simulations on gridworld environments.

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

TL;DR

This work proves an upper bound on the approximation error of linear value function approximation under the learned spectral features of the Markov Decision Processes, holding for general (non-uniform) policies without any assumptions on the symmetry of the induced transition kernel.

Abstract

Learning compact state representations in Markov Decision Processes (MDPs) has proven crucial for addressing the curse of dimensionality in large-scale reinforcement learning (RL) problems. Existing principled approaches leverage structural priors on the MDP by constructing state representations as linear combinations of the state-graph Laplacian eigenvectors. When the transition graph is unknown or the state space is prohibitively large, the graph spectral features can be estimated directly via sample trajectories. In this work, we prove an upper bound on the approximation error of linear value function approximation under the learned spectral features. We show how this error scales with the algebraic connectivity of the state-graph, grounding the approximation quality in the topological structure of the MDP. We further bound the error introduced by the eigenvector estimation itself, leading to an end-to-end error decomposition across the representation learning pipeline. Additionally, our expression of the Laplacian operator for the RL setting, although equivalent to existing ones, prevents some common misunderstandings, of which we show some examples from the literature. Our results hold for general (non-uniform) policies without any assumptions on the symmetry of the induced transition kernel. We validate our theoretical findings with numerical simulations on gridworld environments.
Paper Structure (27 sections, 19 theorems, 146 equations, 5 figures)

This paper contains 27 sections, 19 theorems, 146 equations, 5 figures.

Key Result

Theorem 3.3

Let $u_1, \ldots, u_{|\mathcal{S}|}$ be the eigenvectors of $L$ associated with eigenvalues $\lambda_1 \leq \ldots \leq \lambda_{|\mathcal{S}|}$. Denote as $\widehat{u}_1, \ldots, \widehat{u}_k$ the features produced by $\epsilon$-optimal GDO (Assumption asm:gdo) and $\widehat{v}_k$ the approximatio

Figures (5)

  • Figure 1: Shows the error between the true value function $v$ and approximated one. Two baselines are considered: the analytical case where the representation is obtained truncating the exact eigenvectors and its approximation obtained by optimizing GDO.
  • Figure 2: On the y-axis shows the log scale value assumed by $\lambda_2$, when the number of obstacles (walls) increases, or equivalently decreases the connectivity of the graph.
  • Figure 3: Shows the error of the analytical representation varying the number of eigenvectors $k$. On the y-axis we display the log scale value of the error, while on the x-axis the cut index $k$.
  • Figure 4: Shows the relationship between the second eigenvalue $\lambda_2$ and the error using the approximated representation via GDO. On the y-axis we display the value of the error, while on the x-axis the values assumed by $\lambda_2$, both metrics vary with respect to the number of walls considered.
  • Figure 5: Shows the relationship between the second eigenvalue $\lambda_2$ and the error using the analytical eigenvector representation. On the y-axis we display the value of the error, while on the x-axis the values assumed by $\lambda_2$, both metrics vary with respect to the number of walls considered.

Theorems & Definitions (32)

  • Theorem 3.3
  • Lemma 3.3
  • Lemma 3.3
  • Lemma 3.3
  • Proposition 4.1
  • Proposition 4.1
  • Theorem 1.1
  • proof
  • Lemma 1.0
  • proof
  • ...and 22 more