Table of Contents
Fetching ...

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

David Cheikhi, Daniel Russo

TL;DR

This work identifies fundamental limits on the representational power of value functions for policy evaluation, linking statistical inefficiency to information loss about transition dynamics. By introducing decoupled MRPs and analyzing three linear-dynamics families, it shows that value-function representations cannot always capture essential structure, causing model-free methods like LSTD to be inefficient in certain settings while remaining competitive in others. The results argue for designing structure-aware value representations or auxiliary mechanisms to exploit known problem decompositions (e.g., decoupled components or diagonal dynamics) rather than relying on generic value-function estimators. Overall, the paper clarifies when model-free approaches may or may not be preferred and provides a roadmap for building more sample-efficient, structure-aligned policy evaluation methods.

Abstract

Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in the space of value functions. We explore this through a series of case studies focused on structures that arises in many important problems. In several, there is no information loss and value-based methods are as statistically efficient as model based ones. In other closely-related examples, information loss is severe and value-based methods are severely outperformed. A deeper investigation points to the limitations of the representational power as the driver of the inefficiency, as opposed to failure in algorithm design.

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

TL;DR

This work identifies fundamental limits on the representational power of value functions for policy evaluation, linking statistical inefficiency to information loss about transition dynamics. By introducing decoupled MRPs and analyzing three linear-dynamics families, it shows that value-function representations cannot always capture essential structure, causing model-free methods like LSTD to be inefficient in certain settings while remaining competitive in others. The results argue for designing structure-aware value representations or auxiliary mechanisms to exploit known problem decompositions (e.g., decoupled components or diagonal dynamics) rather than relying on generic value-function estimators. Overall, the paper clarifies when model-free approaches may or may not be preferred and provides a roadmap for building more sample-efficient, structure-aligned policy evaluation methods.

Abstract

Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in the space of value functions. We explore this through a series of case studies focused on structures that arises in many important problems. In several, there is no information loss and value-based methods are as statistically efficient as model based ones. In other closely-related examples, information loss is severe and value-based methods are severely outperformed. A deeper investigation points to the limitations of the representational power as the driver of the inefficiency, as opposed to failure in algorithm design.
Paper Structure (22 sections, 14 theorems, 41 equations, 3 figures, 1 table)

This paper contains 22 sections, 14 theorems, 41 equations, 3 figures, 1 table.

Key Result

Proposition 4.2

The class of value functions of decoupled MRPs is the set of separable value functions: $\{V_M | M \in \mathbb{M}_D \} = \mathbb{V}_D$.

Figures (3)

  • Figure 1: Mean-squared error of LSTD and model based estimators when dynamics are linear and rewards are quadratic (\ref{['fig:teaser_quad']}), when rewards are simplified to be linear (\ref{['fig:teaser_lin']}) and when dynamics are further simplified to be diagonal (\ref{['fig:teaser_diag']}).
  • Figure 2: Ratio of the mean-squared error estimation of LSTD and model based for a randomly generated decoupled MRP, using $\gamma = 0.9, N =5$. Here each sample $\hat{\beta}$ was obtained using a trajectory of length $n = 1000$ and $80$ such samples were averaged to obtain an estimation of the MSE.
  • Figure 3: Ratio of the mean-squared error estimation of LSTD and diagonal model based as a functoin of the dimension $d$. The system considered is the one with dynamics $A = 0.9 I_d$ and rewards $\theta = \boldsymbol{1}_d$, using $\gamma = 0.9$. Here $\lambda = \gamma = 0.9$. $100$ samples of each $\hat{\beta}$ were obtained using $n = 1000$ transitions.

Theorems & Definitions (26)

  • Definition 1.1
  • Definition 1.2: Loss of information
  • Definition 4.1: Decoupled MRP
  • Proposition 4.2
  • proof
  • Theorem 4.3: Value representation loses information
  • proof
  • Theorem 5.1
  • Theorem 5.2
  • proof
  • ...and 16 more