Individual differences in the cognitive mechanisms of planning strategy discovery
Ruiqi He, Falk Lieder
TL;DR
The paper investigates how people discover new planning strategies by extending metacognitive reinforcement learning (MCRL) with intrinsically generated pseudo-rewards, subjective effort valuation, and termination deliberation. By applying to planning-task data, the authors show that a majority of participants are better explained by variants incorporating these mechanisms, with PR and SE facilitating more planning and higher performance for subgroups, while TD tends to reduce planning. Despite these insights, none of the extended variants fully closes the gap between model-based and human discovery rates, indicating additional factors underlie human strategy discovery. The work highlights substantial individual differences in metacognitive learning and provides a framework for testing further cognitive mechanisms that support experience-driven planning strategy emergence.
Abstract
People employ efficient planning strategies. But how are these strategies acquired? Previous research suggests that people can discover new planning strategies through learning from reinforcements, a process known as metacognitive reinforcement learning (MCRL). While prior work has shown that MCRL models can learn new planning strategies and explain more participants' experience-driven discovery better than alternative mechanisms, it also revealed significant individual differences in metacognitive learning. Furthermore, when fitted to human data, these models exhibit a slower rate of strategy discovery than humans. In this study, we investigate whether incorporating cognitive mechanisms that might facilitate human strategy discovery can bring models of MCRL closer to human performance. Specifically, we consider intrinsically generated metacognitive pseudo-rewards, subjective effort valuation, and termination deliberation. Analysis of planning task data shows that a larger proportion of participants used at least one of these mechanisms, with significant individual differences in their usage and varying impacts on strategy discovery. Metacognitive pseudo-rewards, subjective effort valuation, and learning the value of acting without further planning were found to facilitate strategy discovery. While these enhancements provided valuable insights into individual differences and the effect of these mechanisms on strategy discovery, they did not fully close the gap between model and human performance, prompting further exploration of additional factors that people might use to discover new planning strategies.
