Sequential sampling without comparison to boundary through model-free reinforcement learning
Jamal Esmaily, Rani Moran, Yasser Roudi, Bahador Bahrami
TL;DR
The paper tackles how agents learn when to commit in perceptual decisions without relying on an evidence-accumulation boundary. It introduces a model-free Q-learning framework with an additional Wait action, enabling sequential sampling to be governed by experience rather than a fixed threshold. The approach reproduces canonical psychometric and chronometric patterns, shows how payoff context shapes the speed-accuracy trade-off via the learned terminal state $B$, and demonstrates close alignment with reward-optimized solutions under many conditions, while also offering mechanisms to account for learning dynamics, observational learning, urgency, PES, and volatility effects. This boundary-free RL perspective provides a unifying, testable account of decision making that can leverage discarded training data and adapt to changing contexts without requiring explicit boundary computations.
Abstract
Although evidence integration to the boundary model has successfully explained a wide range of behavioral and neural data in decision making under uncertainty, how animals learn and optimize the boundary remains unresolved. Here, we propose a model-free reinforcement learning algorithm for perceptual decisions under uncertainty that dispenses entirely with the concepts of decision boundary and evidence accumulation. Our model learns whether to commit to a decision given the available evidence or continue sampling information at a cost. We reproduced the canonical features of perceptual decision-making such as dependence of accuracy and reaction time on evidence strength, modulation of speed-accuracy trade-off by payoff regime, and many others. By unifying learning and decision making within the same framework, this model can account for unstable behavior during training as well as stabilized post-training behavior, opening the door to revisiting the extensive volumes of discarded training data in the decision science literature.
