Program-Based Strategy Induction for Reinforcement Learning
Carlos G. Correa, Thomas L. Griffiths, Nathaniel D. Daw
TL;DR
The paper tackles the gap between traditional incremental reinforcement-learning models and the discrete, heuristic strategies observed in humans and animals. It introduces Bayesian program induction to infer program-structured strategies that balance simplicity and effectiveness, via a prior over programs and a likelihood linked to the value of a strategy $V(pi)$ in a given task. Applying this framework to several bandit tasks reveals interpretable strategies such as WSLS-like rules, reward accumulators, horizon-aware exploration, and discrete decision states, offering a resource-rational explanation for adaptive behavior. The approach yields a modular, interpretable account of strategy induction with potential for extension to planning and behavior analysis, providing an alternative to opaque neural network-based strategy discovery.
Abstract
Typical models of learning assume incremental estimation of continuously-varying decision variables like expected rewards. However, this class of models fails to capture more idiosyncratic, discrete heuristics and strategies that people and animals appear to exhibit. Despite recent advances in strategy discovery using tools like recurrent networks that generalize the classic models, the resulting strategies are often onerous to interpret, making connections to cognition difficult to establish. We use Bayesian program induction to discover strategies implemented by programs, letting the simplicity of strategies trade off against their effectiveness. Focusing on bandit tasks, we find strategies that are difficult or unexpected with classical incremental learning, like asymmetric learning from rewarded and unrewarded trials, adaptive horizon-dependent random exploration, and discrete state switching.
