On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

David Cheikhi; Daniel Russo

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

David Cheikhi, Daniel Russo

TL;DR

This work identifies fundamental limits on the representational power of value functions for policy evaluation, linking statistical inefficiency to information loss about transition dynamics. By introducing decoupled MRPs and analyzing three linear-dynamics families, it shows that value-function representations cannot always capture essential structure, causing model-free methods like LSTD to be inefficient in certain settings while remaining competitive in others. The results argue for designing structure-aware value representations or auxiliary mechanisms to exploit known problem decompositions (e.g., decoupled components or diagonal dynamics) rather than relying on generic value-function estimators. Overall, the paper clarifies when model-free approaches may or may not be preferred and provides a roadmap for building more sample-efficient, structure-aligned policy evaluation methods.

Abstract

Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in the space of value functions. We explore this through a series of case studies focused on structures that arises in many important problems. In several, there is no information loss and value-based methods are as statistically efficient as model based ones. In other closely-related examples, information loss is severe and value-based methods are severely outperformed. A deeper investigation points to the limitations of the representational power as the driver of the inefficiency, as opposed to failure in algorithm design.

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

TL;DR

Abstract

Paper Structure (22 sections, 14 theorems, 41 equations, 3 figures, 1 table)

This paper contains 22 sections, 14 theorems, 41 equations, 3 figures, 1 table.

Introduction
Puzzling case-studies on the statistical efficiency of model-free methods.
Limited representational power of value functions.
Related work
Problem formulation
Markov Reward Processes with Decoupled Transition Structure
On the importance of decoupled structures
Definition
Decoupled structure cannot be encoded in the value space
Statistical inefficiency of LSTD
Linear dynamical systems
General linear dynamics, linear rewards
Diagonal linear dynamics, linear rewards
Linear Quadratic Control
Conclusion
...and 7 more sections

Key Result

Proposition 4.2

The class of value functions of decoupled MRPs is the set of separable value functions: $\{V_M | M \in \mathbb{M}_D \} = \mathbb{V}_D$.

Figures (3)

Figure 1: Mean-squared error of LSTD and model based estimators when dynamics are linear and rewards are quadratic (\ref{['fig:teaser_quad']}), when rewards are simplified to be linear (\ref{['fig:teaser_lin']}) and when dynamics are further simplified to be diagonal (\ref{['fig:teaser_diag']}).
Figure 2: Ratio of the mean-squared error estimation of LSTD and model based for a randomly generated decoupled MRP, using $\gamma = 0.9, N =5$. Here each sample $\hat{\beta}$ was obtained using a trajectory of length $n = 1000$ and $80$ such samples were averaged to obtain an estimation of the MSE.
Figure 3: Ratio of the mean-squared error estimation of LSTD and diagonal model based as a functoin of the dimension $d$. The system considered is the one with dynamics $A = 0.9 I_d$ and rewards $\theta = \boldsymbol{1}_d$, using $\gamma = 0.9$. Here $\lambda = \gamma = 0.9$. $100$ samples of each $\hat{\beta}$ were obtained using $n = 1000$ transitions.

Theorems & Definitions (26)

Definition 1.1
Definition 1.2: Loss of information
Definition 4.1: Decoupled MRP
Proposition 4.2
proof
Theorem 4.3: Value representation loses information
proof
Theorem 5.1
Theorem 5.2
proof
...and 16 more

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

TL;DR

Abstract

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (26)