Table of Contents
Fetching ...

Do Large Language Models Mentalize When They Teach?

Sevan K. Harootonian, Mark K. Ho, Thomas L. Griffiths, Yael Niv, Ilia Sucholutsky

Abstract

How do LLMs decide what to teach next: by reasoning about a learner's knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner's trajectory through a reward-annotated directed graph and must reveal a single edge so the learner would choose a better path if they replanned. We run a range of LLMs as simulated teachers and fit their trial-by-trial choices with the same cognitive models used for humans: a Bayes-Optimal teacher that infers which transitions the learner is missing (inverse planning), weaker Bayesian variants, heuristic baselines (e.g., reward based), and non-mentalizing utility models. In a baseline experiment matched to the stimuli presented to human subjects, most LLMs perform well, show little change in strategy over trials, and their graph-by-graph performance is similar to that of humans. Model comparison (BIC) shows that Bayes-Optimal teaching best explains most models' choices. When given a scaffolding intervention, models follow auxiliary inference- or reward-focused prompts, but these scaffolds do not reliably improve later teaching on heuristic-incongruent test graphs and can sometimes reduce performance. Overall, cognitive model fits provide insight into LLM tutoring policies and show that prompt compliance does not guarantee better teaching decisions.

Do Large Language Models Mentalize When They Teach?

Abstract

How do LLMs decide what to teach next: by reasoning about a learner's knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner's trajectory through a reward-annotated directed graph and must reveal a single edge so the learner would choose a better path if they replanned. We run a range of LLMs as simulated teachers and fit their trial-by-trial choices with the same cognitive models used for humans: a Bayes-Optimal teacher that infers which transitions the learner is missing (inverse planning), weaker Bayesian variants, heuristic baselines (e.g., reward based), and non-mentalizing utility models. In a baseline experiment matched to the stimuli presented to human subjects, most LLMs perform well, show little change in strategy over trials, and their graph-by-graph performance is similar to that of humans. Model comparison (BIC) shows that Bayes-Optimal teaching best explains most models' choices. When given a scaffolding intervention, models follow auxiliary inference- or reward-focused prompts, but these scaffolds do not reliably improve later teaching on heuristic-incongruent test graphs and can sometimes reduce performance. Overall, cognitive model fits provide insight into LLM tutoring policies and show that prompt compliance does not guarantee better teaching decisions.

Paper Structure

This paper contains 39 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Graph Teaching Task. The numeric value in each node indicates the reward obtained when the learner visits that node. The learner starts at the top node and moves down or diagonally down until reaching a terminal node. A) Learner computes and executes the best available trajectory. B) Teacher observes the learner's trajectory and selects an edge to teach. C) Learner incorporates the taught edge and computes and executes a new, better trajectory. Note: participants and models only see part B and make assumptions for parts A and C.
  • Figure 2: Graph-wise performance profiles align with humans. Left: mean Teaching Score across the 20 unique graphs configurations for humans (black) and each LLM model. Right: Pearson correlation between each model’s graph-wise performance profile and the human profile; correlations above 0.7 indicate strong alignment with human graph-wise performance ordering. Model colors on the left are the same as on the right. *$p<.05$, ***$p<10^{-4}$
  • Figure 3: Distribution of Teaching Scores. Individual-level average Teaching Score in the Baseline Experiment for each LLM model (left) and human subjects (right). Horizontal reference lines show benchmark scores of cognitive models under an argmax policy.
  • Figure 4: Cognitive model fits in the Baseline Experiment. BIC scores across candidate teaching models (lower is better), with bars indicating the fraction of simulated teachers best fit by each cognitive model (left) and for human subjects (right).
  • Figure 5: Auxiliary scaffolded edge selections. Blue line shows probabilities of choosing edges in the inference scaffolding condition, with edges ordered according to the predicted probability that an edge is unknown to the learner according to the Bayes-Optimal Teacher (0 - most likely known, 16 - most likely unknown). Pink shows probabilities of choosing edges in the reward scaffolding condition, with edges ordered according to the value of each edge as a function of the sum of the two nodes it connects, as defined by the Reward Heuristic (0 - lowest value, 16 - highest value). In both cases, values to the right correspond to edges more consistent with the instruction for this condition, but note that the order of edges differs between the Bayes-Optimal Teacher and the Reward Heuristic, and thus for the pink and blue lines. Because three edges were marked on each trial, the y-axis shows the probability that an edge was one of the three selected (maximum: 33.3%), averaged over all trials and LLM teachers. Most LLM responses aligned with the model predictions, indicating that participants performed the scaffolding task as intended.
  • ...and 4 more figures