On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency
David Cheikhi, Daniel Russo
TL;DR
This work identifies fundamental limits on the representational power of value functions for policy evaluation, linking statistical inefficiency to information loss about transition dynamics. By introducing decoupled MRPs and analyzing three linear-dynamics families, it shows that value-function representations cannot always capture essential structure, causing model-free methods like LSTD to be inefficient in certain settings while remaining competitive in others. The results argue for designing structure-aware value representations or auxiliary mechanisms to exploit known problem decompositions (e.g., decoupled components or diagonal dynamics) rather than relying on generic value-function estimators. Overall, the paper clarifies when model-free approaches may or may not be preferred and provides a roadmap for building more sample-efficient, structure-aligned policy evaluation methods.
Abstract
Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in the space of value functions. We explore this through a series of case studies focused on structures that arises in many important problems. In several, there is no information loss and value-based methods are as statistically efficient as model based ones. In other closely-related examples, information loss is severe and value-based methods are severely outperformed. A deeper investigation points to the limitations of the representational power as the driver of the inefficiency, as opposed to failure in algorithm design.
