Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus

Caroline Ahn; Quan Do; Leah Bakst; Michael P. Pascale; Joseph T. McGuire; Michael E. Hasselmo; Chantal E. Stern

Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus

Caroline Ahn, Quan Do, Leah Bakst, Michael P. Pascale, Joseph T. McGuire, Michael E. Hasselmo, Chantal E. Stern

TL;DR

CogARC is introduced, a diverse human-adapted subset of the Abstraction and Reasoning Corpus which was originally developed to benchmark abstract reasoning in artificial intelligence, providing insight into how people generalize, misgeneralize, and adapt their strategies under uncertainty.

Abstract

Humans exhibit remarkable flexibility in abstract reasoning, and can rapidly learn and apply rules from sparse examples. To investigate the cognitive strategies underlying this ability, we introduce the Cognitive Abstraction and Reasoning Corpus (CogARC), a diverse human-adapted subset of the Abstraction and Reasoning Corpus (ARC) which was originally developed to benchmark abstract reasoning in artificial intelligence. Across two experiments, CogARC was administered to a total of 260 human participants who freely generated solutions to 75 abstract visual reasoning problems. Success required inferring input-output rules from a small number of examples to transform the test input into one correct test output. Participants' behavior was recorded at high temporal resolution, including example viewing, edit sequences, and multi-attempt submissions. Participants were generally successful (mean accuracy ~90% for experiment 1 (n=40), ~80% for experiment 2 (n=220) across problems), but performance varied widely across problems and participants. Harder problems elicited longer deliberation times and greater divergence in solution strategies. Over the course of the task, participants initiated responses more quickly but showed a slight decline in accuracy, suggesting increased familiarity with the task structure rather than improved rule-learning ability. Importantly, even incorrect solutions were often highly convergent, even when the problem-solving trajectories differed in length and smoothness. Some trajectories progressed directly and efficiently toward a stable outcome, whereas others involved extended exploration or partial restarts before converging. Together, these findings highlight CogARC as a rich behavioral environment for studying human abstract reasoning, providing insight into how people generalize, misgeneralize, and adapt their strategies under uncertainty.

Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus

TL;DR

Abstract

Paper Structure (32 sections, 1 equation, 15 figures, 3 tables)

This paper contains 32 sections, 1 equation, 15 figures, 3 tables.

Introduction
Methods
Experiment 1
Human participants
Task design
Experiment 2
Human participants
Task design
Behavioral analysis
Performance measures
Edit Sequence Analysis
Trends over time
Shared errors
Edit distance to submitted solution
Edit trajectory efficiency
...and 17 more sections

Figures (15)

Figure 1: Task interface for experiment 1. The study participants solved 75 visuospatial abstract reasoning problems on the web browser-based interface shown above. The screen was divided into three parts: input-output examples to the left, test input grid in the middle, and a test output editor that participants could manipulate to the right.
Figure 2: Two problems from the CogARC task and their task properties. Problem A is solved by drawing a blue outline around the grey tiles in the input. The core knowledge prior required to learn this rule is ‘objectness’. Problem B has a conditional rule in which any instances of fewer than 3 connected tiles of the same color in the input get re-colored to green in the output. Therefore, we can say this problem involves the core knowledge priors of ‘numbers and counting’ and ‘objectness’, and is of higher complexity than Problem A. For visualization purposes, problems are shown here in a simplified schematic format rather than the full interactive task interface used in the experiment.
Figure 3: Task schematic. A) Experiment 2 task interface Participants could freely switch back and forth between edit view and example view. They were given feedback upon submission. B) Problem solving flow and difficulty scoring Participants got up to three attempts to reach the correct solution for a problem. Each participant was given a difficulty score of 1-4 for each problem based on how many submissions it took and what the outcome was.
Figure 4: Exploratory results from Experiment 1. A) Distribution of participant accuracy across all problems and attempts ($mean = 89.5\%$, $SD = 10.2\%$). Dashed line indicates the median. B) Distribution of deliberation times (latency from trial onset to first edit; $mean = 22.3 s$, $SD = 13.5 s$). Values are plotted on a $log_{10}$ scale for visualization. Dashed line indicates the median. C) Distribution of mean difficulty scores across problems ($mean = 1.50$, $SD = 0.55$). Dashed line indicates the median. D) Positive correlation between mean deliberation time and mean difficulty score per problem (Pearson $r = 0.52$, $p < .001$). Line reflects best linear fit with 95% confidence interval indicated by the shaded region. E) Stacked bar plot showing the distribution of difficulty scores across all problems, ordered by number of first attempt successes. F) Comparison of mean difficulty scores per problem across core knowledge categories. (Obj = Objectness, Geo = Geometry and Pattern, Num = Numbers and Counting, GoD = Goal-directedness) G) Comparison of mean difficulty scores across experimenter-assigned complexity levels. (** $p<0.01$)
Figure 5: Participant performance across trials. A) Distribution of number of trials completed per participant. B) Histogram showing the distribution of participant accuracy (percentage of correct trials over all trials; $mean = 80.1\%$, $SD = 16.6\%$). Dashed line shows the median. C) Distribution of deliberation times (latency from trial onset to first edit; $mean = 52.74 s$, $median = 25.0 s$, $SD = 1336.9 s$) shown on a log scale for visualization. The top 1% of values were excluded to improve readability; dashed line indicates the median.
...and 10 more figures

Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus

TL;DR

Abstract

Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus

Authors

TL;DR

Abstract

Table of Contents

Figures (15)