Table of Contents
Fetching ...

People use fast, flat goal-directed simulation to reason about novel problems

Katherine M. Collins, Cedegao E. Zhang, Lionel Wong, Mauricio Barba da Costa, Graham Todd, Adrian Weller, Samuel J. Cheyette, Thomas L. Griffiths, Joshua B. Tenenbaum

TL;DR

The paper investigates how novices reason about novel, two-player grid games by proposing the Intuitive Gamer, a model that relies on fast, shallow, goal-directed probabilistic simulations. It integrates a one-step lookahead player with a sampling-based reasoning module and is evaluated across a large, diverse set of 121 games and multiple tasks (outcome fairness, funness, first moves, and predicting others’ moves). Across zero-shot and observed-play experiments, the Intuitive Gamer accounts for human judgments and actions better than deeper, more compute-intensive models (e.g., Expert Gamer, MCTS) while maintaining strong correlations with game-theoretic expectations. The work demonstrates that people can quickly and systematically reason about new problems with compute-efficient simulations and offers a framework for building more human-like AI that can assess whether a task is worth thinking about at all. It also provides broad datasets and methodological tools for studying explainable, human-like reasoning in novel problem spaces, with implications for AI design and evaluation in unfamiliar domains.

Abstract

Games have long been a microcosm for studying planning and reasoning in both natural and artificial intelligence, especially with a focus on expert-level or even super-human play. But real life also pushes human intelligence along a different frontier, requiring people to flexibly navigate decision-making problems that they have never thought about before. Here, we use novice gameplay to study how people make decisions and form judgments in new problem settings. We show that people are systematic and adaptively rational in how they play a game for the first time, or evaluate a game (e.g., how fair or how fun it is likely to be) before they have played it even once. We explain these capacities via a computational cognitive model that we call the "Intuitive Gamer". The model is based on mechanisms of fast and flat (depth-limited) goal-directed probabilistic simulation--analogous to those used in Monte Carlo tree-search models of expert game-play, but scaled down to use very few stochastic samples, simple goal heuristics for evaluating actions, and no deep search. In a series of large-scale behavioral studies with over 1000 participants and 121 two-player strategic board games (almost all novel to our participants), our model quantitatively captures human judgments and decisions varying the amount and kind of experience people have with a game--from no experience at all ("just thinking"), to a single round of play, to indirect experience watching another person and predicting how they should play--and does so significantly better than much more compute-intensive expert-level models. More broadly, our work offers new insights into how people rapidly evaluate, act, and make suggestions when encountering novel problems, and could inform the design of more flexible and human-like AI systems that can determine not just how to solve new tasks, but whether a task is worth thinking about at all.

People use fast, flat goal-directed simulation to reason about novel problems

TL;DR

The paper investigates how novices reason about novel, two-player grid games by proposing the Intuitive Gamer, a model that relies on fast, shallow, goal-directed probabilistic simulations. It integrates a one-step lookahead player with a sampling-based reasoning module and is evaluated across a large, diverse set of 121 games and multiple tasks (outcome fairness, funness, first moves, and predicting others’ moves). Across zero-shot and observed-play experiments, the Intuitive Gamer accounts for human judgments and actions better than deeper, more compute-intensive models (e.g., Expert Gamer, MCTS) while maintaining strong correlations with game-theoretic expectations. The work demonstrates that people can quickly and systematically reason about new problems with compute-efficient simulations and offers a framework for building more human-like AI that can assess whether a task is worth thinking about at all. It also provides broad datasets and methodological tools for studying explainable, human-like reasoning in novel problem spaces, with implications for AI design and evaluation in unfamiliar domains.

Abstract

Games have long been a microcosm for studying planning and reasoning in both natural and artificial intelligence, especially with a focus on expert-level or even super-human play. But real life also pushes human intelligence along a different frontier, requiring people to flexibly navigate decision-making problems that they have never thought about before. Here, we use novice gameplay to study how people make decisions and form judgments in new problem settings. We show that people are systematic and adaptively rational in how they play a game for the first time, or evaluate a game (e.g., how fair or how fun it is likely to be) before they have played it even once. We explain these capacities via a computational cognitive model that we call the "Intuitive Gamer". The model is based on mechanisms of fast and flat (depth-limited) goal-directed probabilistic simulation--analogous to those used in Monte Carlo tree-search models of expert game-play, but scaled down to use very few stochastic samples, simple goal heuristics for evaluating actions, and no deep search. In a series of large-scale behavioral studies with over 1000 participants and 121 two-player strategic board games (almost all novel to our participants), our model quantitatively captures human judgments and decisions varying the amount and kind of experience people have with a game--from no experience at all ("just thinking"), to a single round of play, to indirect experience watching another person and predicting how they should play--and does so significantly better than much more compute-intensive expert-level models. More broadly, our work offers new insights into how people rapidly evaluate, act, and make suggestions when encountering novel problems, and could inform the design of more flexible and human-like AI systems that can determine not just how to solve new tasks, but whether a task is worth thinking about at all.

Paper Structure

This paper contains 68 sections, 12 equations, 48 figures, 14 tables, 10 algorithms.

Figures (48)

  • Figure 1: Our novel game dataset and suite of game tasks. a, Ten example games from our $121$ game dataset. Games vary in board sizes and rules, such as what it takes to win and how many pieces any player can make on their opening move. b-d, We assess three people's reasoning about novel game through three behavioral studies designed to test how people b, reason about games before they even play a single game; c, how people decide what actions to make in their first instance of play; and d, how people predict others should play when watching them play.
  • Figure 1: Variance of human and model payoff predictions with varied compute budget (number of simulations, $k$) for the Intuitive Gamer model). The variance over each approximately $20$ human participant payoff judgments per game and compare against the average variance under simulated sets of $N=20$ simulated participants each drawing $k$ simulations from the model. Model variance is binned into quintiles along the horizontal axis and the average human variance is computed for points in that bin (the black dot overlayed on the scatterplot). Error bars are standard deviation of the variance for points in that bin.
  • Figure 2: The Intuitive Gamer model, compared to prior models of game reasoning.a, Prior work modeling expert gameplay often involves deep tree search to determine what move to make given a board state $S_t$silver2016masteringvan2023expertise. It is unlikely that novice human reasoners conduct such computationally expensive search and state evaluation before deciding whether to engage with the problem at all. b, What might novice reasoners be doing instead? Gameplay agents can differ in the amount of compute and expertise brought to bear to reason about any game (differing in search depth and value sophistication). Our proposal is that people reasoning about problems with which they have no experience, sit at the lower-end of this spectrum, but not the lowest. c, The Intuitive Gamer is conducts depth-limited search with game-general abstract goal-directed value functions that have yet to encode game-specific features. Specifically, we specify an Intuitive Gamer gameplay agent that conducts no more than a single step of lookahead when assessing deciding what action ($A_{t+1}$) to take from a given board state ($S_t$). The Intuitive Gamer is goal-directed, assessing whether any action would advance the player's own goal (purple) and how much it may block the progress of the opponent's goal (yellow). These operations are done in a "flat" fashion: thinking only one step ahead. The final action is selected probabilistically by sampling from a softmax-distribution over the estimated values per action. d, To reason about any new game query $\psi$ for a given game description $\mathcal{G}$, we posit that people conduct only a few ($k$) self-play simulations between the gameplay agent (as depicted in panel c) to answer the query, which could be run to termination or probabilistically stop early. Taken together, the Intuitive Gamer model is fast (low $k$), flat (low-search depth), goal-directed (in the value function), and probabilistic (in action selection)---and involves mental simulations of gameplay.
  • Figure 2: Partial game simulations in the Intuitive Gamer reasoning module. All main results are reported under the assumption that the game reasoning module runs simulations to the end. While most games do not involve a large number of moves, under the Intuitive Gamer player module, for the game to end---it is possible that people are engaging even cheaper partial simulations when evaluating novel games for the first time. a, Exploratory analyses into running the Intuitive Gamer reasoning module under a probabilistic stopping rule equivalently captures the majority of the explainable variance in human payoff judgments as running the Intuitive Gamer reasoning module under full simulations (see Figure \ref{['fig:splash-judge']}a). For direct comparison with the full simulation model, $k=6$ partial simulations were run to estimate game payoff. The partial simulations were computed as follows. For each simulated match of a game, a stopping time was sampled from a uniform distribution over board size and terminated early if the game did not end before that time. If the game terminated before a winner or draw was called, the game counted as a draw. b, To assess the impact of compute budget in the number of simulations used to estimate payoff, the variance of payoff under different number of partial simulations $k$ was again compared with the variance in people's payoff evaluations (as in Figure \ref{['fig:splash-judge']}c and Extended Data Figure \ref{['fig:vary-think-k-dev']}). The best fitting number of samples ($k$) is similarly greater than one and less than ten when using partial simulations, and perhaps even a bit less (e.g., $k=4$) compared to using full game simulations (see Figure \ref{['fig:splash-judge']}c).
  • Figure 3: Evaluating games without ever playing.$238$ participants judged the expected payoff of games drawn from our $121$ game suite. a, The expected payoff computed under the Intuitive Gamer model captures human predictions well when compared to alternate models that scale up or down compute. Each point represents the payoff for one of the $n=121$ game stimuli. Error bars depict 95% CIs around the mean estimated payoff from people and over $k=6$ simulations for each model sampled from $20$ simulated participants.; b, Comparing the full Intuitive Gamer against lesions of one or more critical components. Lesioning the flatness (increasing depth $d$ to approximately $3$), probabilistic nature of the game simulations (i.e., setting temperature of action selections to zero), and goal-directedness (ablating components of the value function) all degrade fit relative to the full Intuitive Gamer model. The dashed line indicates the mean and 95% CI split-half $R^2$ on the human-predicted payoffs, indicating the full Intuitive Gamer model captures essentially all of the explainable variance not due to noise. c, Only a small number of simulations ($k$ approximately $5-7$, which we take as $k=6$) are needed to well-capture the variance in participants' judgments, as measured by (1) the Root Mean Squared Error (RMSE) between participants' variance per game and the variance over model simulations per game and (2) the Wasserstein Distance between the distribution of variances across games which enables a finer-grained distinction between low variance from high $k$. Full scatterplots are shown in Extended Data Figure \ref{['fig:vary-think-k-dev']}.; d, People's predictions are reasonable relative to the game-theoretic optimal values, but not as well-captured as the Intuitive Gamer predictions. Points show the average payoff per game, for the $78$ of $121$ games where an optimal payoff can be computed (either analytically or approximately with MCTS). Error bars depict 95% CIs around the bootstrapped mean human prediction per game.
  • ...and 43 more figures