Table of Contents
Fetching ...

Toward In-Context Teaching: Adapting Examples to Students' Misconceptions

Alexis Ross, Jacob Andreas

TL;DR

This paper tackles adaptive teaching by enabling a teacher to infer student misconceptions online and tailor examples to correct them. It introduces AdapT, an evaluation suite, and AToM, a probabilistic model that jointly infers student priors and optimizes teaching actions, demonstrating gains over non-adaptive baselines in both simulated and human experiments. Across fractions, verbs, and functions, AToM and language-model-based teachers show substantial potential, with GPT-4 offering complementary adaptive behavior. The work highlights both the feasibility and the challenges of online misconception inference and targeted example selection, suggesting avenues for combining structured Bayesian approaches with rich priors encoded in language models to improve real-world teaching.

Abstract

When a teacher provides examples for a student to study, these examples must be informative, enabling a student to progress from their current state toward a target concept or skill. Good teachers must therefore simultaneously infer what students already know and adapt their teaching to students' changing state of knowledge. There is increasing interest in using computational models, particularly large language models, as pedagogical tools. As students, language models in particular have shown a remarkable ability to adapt to new tasks given small numbers of examples. But how effectively can these models adapt as teachers to students of different types? To study this question, we introduce a suite of models and evaluation methods we call AdapT. AdapT has two components: (1) a collection of simulated Bayesian student models that can be used for evaluation of automated teaching methods; (2) a platform for evaluation with human students, to characterize the real-world effectiveness of these methods. We additionally introduce (3) AToM, a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimizes for the correctness of future beliefs. In evaluations of simulated students across three learning domains (fraction arithmetic, English morphology, function learning), AToM systematically outperforms LLM-based and standard Bayesian teaching models. In human experiments, both AToM and LLMs outperform non-adaptive random example selection. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.

Toward In-Context Teaching: Adapting Examples to Students' Misconceptions

TL;DR

This paper tackles adaptive teaching by enabling a teacher to infer student misconceptions online and tailor examples to correct them. It introduces AdapT, an evaluation suite, and AToM, a probabilistic model that jointly infers student priors and optimizes teaching actions, demonstrating gains over non-adaptive baselines in both simulated and human experiments. Across fractions, verbs, and functions, AToM and language-model-based teachers show substantial potential, with GPT-4 offering complementary adaptive behavior. The work highlights both the feasibility and the challenges of online misconception inference and targeted example selection, suggesting avenues for combining structured Bayesian approaches with rich priors encoded in language models to improve real-world teaching.

Abstract

When a teacher provides examples for a student to study, these examples must be informative, enabling a student to progress from their current state toward a target concept or skill. Good teachers must therefore simultaneously infer what students already know and adapt their teaching to students' changing state of knowledge. There is increasing interest in using computational models, particularly large language models, as pedagogical tools. As students, language models in particular have shown a remarkable ability to adapt to new tasks given small numbers of examples. But how effectively can these models adapt as teachers to students of different types? To study this question, we introduce a suite of models and evaluation methods we call AdapT. AdapT has two components: (1) a collection of simulated Bayesian student models that can be used for evaluation of automated teaching methods; (2) a platform for evaluation with human students, to characterize the real-world effectiveness of these methods. We additionally introduce (3) AToM, a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimizes for the correctness of future beliefs. In evaluations of simulated students across three learning domains (fraction arithmetic, English morphology, function learning), AToM systematically outperforms LLM-based and standard Bayesian teaching models. In human experiments, both AToM and LLMs outperform non-adaptive random example selection. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
Paper Structure (53 sections, 2 equations, 20 figures, 15 tables)

This paper contains 53 sections, 2 equations, 20 figures, 15 tables.

Figures (20)

  • Figure 1: In the AdapT (Adaptive Teaching) evaluation framework (§\ref{['s:benchmark']}), a teacher selects examples to teach a target concept to a student; however, the student has prior misconceptions that are unknown to the teacher. In the fraction arithmetic task (§\ref{['s:fraction_task']}), some students (multiplication learner) tend to over-generalize the addition procedure of making common denominators and performing arithmetic only on numerators; others (addition learner) tend to over-generalize the multiplication procedure of applying arithmetic on both numerators and denominators. In order to teach effectively, the teacher must jointly 1) infer the student's misconceptions online by observing their behavior throughout their interaction (i.e., the teacher infers that the student is an addition generalizer after observing the prediction $\frac{1}{2} \times \frac{1}{6}\rightarrow\frac{3}{6}$), and 2) adapt to such misconceptions by selecting examples that will most efficiently correct these misconceptions (i.e., the teacher anticipates the student's new incorrect belief that all fractions with equal denominators should be treated as addition problems and selects the example $\frac{1}{5} \times \frac{2}{5}=\frac{2}{25}$ to correct it). We propose AToM, a two-part probabilistic approach that achieves adaptive teaching by maintaining explicit inferences about student priors (§\ref{['s:adaptive_method']}).
  • Figure 2: An overview of the tasks and student types in the AdapT (Adaptive Teaching) evaluation framework (§\ref{['s:benchmark']}). AdapT has three tasks: fractions, verbs, and functions. For the fraction and function tasks, a student's concept space consists of programs; for the verbs task, a student's concept space is the space of generative models corresponding to English past tense verb classes.
  • Figure 3: Area under simulated learners' learning curves, where curves plot learners' posterior beliefs in the target concept by number of datapoints. We report results by task and student type with 3 random seeds per bar. Dashed bars indicate that the true student type is assumed. Note that the y-axis for the f-learner for functions starts at 0.8, as these learners all learn the concept early on, and so differences in teaching methods are small. Error bars show min/max values across seeds. Full learning curves are shown in Figure \ref{['fig:synthetic_learning_all_curves']}.
  • Figure 4: Critical example selection by different teaching methods for the function task. Results are for simulated f-learners, who have a spurious belief about f that agrees with the target f on all but a few examples, as labeled. The opacity of each square corresponds to the mean value of whether the example chosen by the teaching method at that step in learning is a "critical example" (averaged across experimental conditions: seed and concepts). See §\ref{['s:critical_examples']} for details. We report a subset of results here; see §\ref{['fig:critical_examples_full']} for full results.
  • Figure 5: Examples selected by different teaching methods for teaching a and b in the function learning task (i.e., what wug(x) computes when it is defined). The x-axis indicates the order of the chosen example compared to other examples targeting a and b.
  • ...and 15 more figures