Three Dogmas of Reinforcement Learning

David Abel; Mark K. Ho; Anna Harutyunyan

Three Dogmas of Reinforcement Learning

David Abel, Mark K. Ho, Anna Harutyunyan

TL;DR

The paper argues RL should be a holistic paradigm for intelligent agents, not solely an environment-centric problem-solving framework involving models like $MDP$-based formulations. It identifies three dogmas: Environment Spotlight, Learning as Finding a Solution, and the Reward Hypothesis, and argues for their reconsideration. The authors propose an agent-centric model, view learning as continual adaptation, and urge moving beyond scalar rewards toward richer goal representations (e.g., preferences, constraints) framed by axioms. Open questions on a canonical agent, evaluation without guaranteed optimality, and broader objective languages aim to broaden RL's theoretical and practical impact.

Abstract

Modern reinforcement learning has been conditioned by at least three dogmas. The first is the environment spotlight, which refers to our tendency to focus on modeling environments rather than agents. The second is our treatment of learning as finding the solution to a task, rather than adaptation. The third is the reward hypothesis, which states that all goals and purposes can be well thought of as maximization of a reward signal. These three dogmas shape much of what we think of as the science of reinforcement learning. While each of the dogmas have played an important role in developing the field, it is time we bring them to the surface and reflect on whether they belong as basic ingredients of our scientific paradigm. In order to realize the potential of reinforcement learning as a canonical frame for researching intelligent agents, we suggest that it is time we shed dogmas one and two entirely, and embrace a nuanced approach to the third.

Three Dogmas of Reinforcement Learning

TL;DR

The paper argues RL should be a holistic paradigm for intelligent agents, not solely an environment-centric problem-solving framework involving models like

-based formulations. It identifies three dogmas: Environment Spotlight, Learning as Finding a Solution, and the Reward Hypothesis, and argues for their reconsideration. The authors propose an agent-centric model, view learning as continual adaptation, and urge moving beyond scalar rewards toward richer goal representations (e.g., preferences, constraints) framed by axioms. Open questions on a canonical agent, evaluation without guaranteed optimality, and broader objective languages aim to broaden RL's theoretical and practical impact.

Abstract

Paper Structure (13 sections, 3 figures)

This paper contains 13 sections, 3 figures.

On a Paradigm for Intelligent Agents
Dogma One: The Environment Spotlight
The Alternative: Shine the Spotlight on Agents, Too.
Dogma Two: Learning as Finding a Solution
The Alternative: Learning as Adaptation.
Dogma Three: The Reward Hypothesis
The Alternative: Recognize and Embrace Nuance.
Discussion
Open Questions.
On the term "Dogma".
Inspiration.
Other Dogmas.
Conclusion.

Figures (3)

Figure 1: The first dogma, the Environment Spotlight.
Figure 2: Dogma 2: Learning as Finding a Solution.
Figure 3: The third dogma, the Reward Hypothesis. Any goal that a designer might conceive of can be well thought of in terms of the maximization of a reward signal by a learning agent.

Three Dogmas of Reinforcement Learning

TL;DR

Abstract

Three Dogmas of Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)