Model-Free RL Agents Demonstrate System 1-Like Intentionality
Hal Ashton, Matija Franklin
TL;DR
The paper investigates whether model-free reinforcement learning (MF-RL) agents can exhibit intentionality despite lacking explicit planning, reframing System 1 thinking as an analogy to MF-RL and System 2 as model-based reasoning. It argues that intentionality need not require deliberative planning and that MF agents can act with goals embedded in their learned policies, though explaining those actions requires access to a world model or training context. By drawing on legal, psychological, and safety literature, the authors discuss how intent factors into responsibility, deception, and regulation for AI systems. They propose Safe RL shields as a practical mechanism to instantiate System 2-like control, enabling safer deployment and better explainability while acknowledging the necessity of contextual understanding of an agent’s training environment.
Abstract
This paper argues that model-free reinforcement learning (RL) agents, while lacking explicit planning mechanisms, exhibit behaviours that can be analogised to System 1 ("thinking fast") processes in human cognition. Unlike model-based RL agents, which operate akin to System 2 ("thinking slow") reasoning by leveraging internal representations for planning, model-free agents react to environmental stimuli without anticipatory modelling. We propose a novel framework linking the dichotomy of System 1 and System 2 to the distinction between model-free and model-based RL. This framing challenges the prevailing assumption that intentionality and purposeful behaviour require planning, suggesting instead that intentionality can manifest in the structured, reactive behaviours of model-free agents. By drawing on interdisciplinary insights from cognitive psychology, legal theory, and experimental jurisprudence, we explore the implications of this perspective for attributing responsibility and ensuring AI safety. These insights advocate for a broader, contextually informed interpretation of intentionality in RL systems, with implications for their ethical deployment and regulation.
