Table of Contents
Fetching ...

Learning Context Matters: Measuring and Diagnosing Personalization Gaps in LLM-Based Instructional Design

Johaun Hatchett, Debshila Basu Mallick, Brittany C. Bradford, Richard G. Baraniuk

TL;DR

This paper tackles whether providing Learning Context (LC) to LLM-based tutors yields truly personalized instruction. It introduces the Personalization Policy Probe (P$^3$), a framework that generates psychometrically grounded LC probes, formalizes instructional design as a policy, and uses expert judgments to measure alignment and learner-centeredness via metrics like Total Variation Distance and policy expectations. Case studies with GPT-5.2 show that LC nudges instructional planning toward learner-centered strategies and closer to expert policies, but substantial gaps and misalignment remain, including ignored or spuriously influential learner features. The work provides a principled basis for evaluating LC-aware tutoring and motivates targeted improvements in learner prioritization, pedagogical model tuning, and LC engineering to achieve expert-like personalization at scale.

Abstract

The adoption of generative AI in education has accelerated dramatically in recent years, with Large Language Models (LLMs) increasingly integrated into learning environments in the hope of providing personalized support that enhances learner engagement and knowledge retention. However, truly personalized support requires access to meaningful Learning Context (LC) regarding who the learner is, what they are trying to understand, and how they are engaging with the material. In this paper, we present a framework for measuring and diagnosing how the LC influences instructional strategy selection in LLM-based tutoring systems. Using psychometrically grounded synthetic learning contexts and a pedagogically grounded decision space, we compare LLM instructional decisions in context-blind and context-aware conditions and quantify their alignment with the pedagogical judgments of subject matter experts. Our results show that, while providing the LC induces systematic, measurable changes in instructional decisions that move LLM policies closer to the subject matter expert policy, substantial misalignment remains. To diagnose this misalignment, we introduce a relevance-impact analysis that reveals which learner characteristics are attended to, ignored, or spuriously influential in LLM instructional decision-making. This analysis, conducted in collaboration with subject matter experts, demonstrates that LC materially shapes LLM instructional planning but does not reliably induce pedagogically appropriate personalization. Our results enable principled evaluation of context-aware LLM systems and provide a foundation for improving personalization through learner characteristic prioritization, pedagogical model tuning, and LC engineering.

Learning Context Matters: Measuring and Diagnosing Personalization Gaps in LLM-Based Instructional Design

TL;DR

This paper tackles whether providing Learning Context (LC) to LLM-based tutors yields truly personalized instruction. It introduces the Personalization Policy Probe (P), a framework that generates psychometrically grounded LC probes, formalizes instructional design as a policy, and uses expert judgments to measure alignment and learner-centeredness via metrics like Total Variation Distance and policy expectations. Case studies with GPT-5.2 show that LC nudges instructional planning toward learner-centered strategies and closer to expert policies, but substantial gaps and misalignment remain, including ignored or spuriously influential learner features. The work provides a principled basis for evaluating LC-aware tutoring and motivates targeted improvements in learner prioritization, pedagogical model tuning, and LC engineering to achieve expert-like personalization at scale.

Abstract

The adoption of generative AI in education has accelerated dramatically in recent years, with Large Language Models (LLMs) increasingly integrated into learning environments in the hope of providing personalized support that enhances learner engagement and knowledge retention. However, truly personalized support requires access to meaningful Learning Context (LC) regarding who the learner is, what they are trying to understand, and how they are engaging with the material. In this paper, we present a framework for measuring and diagnosing how the LC influences instructional strategy selection in LLM-based tutoring systems. Using psychometrically grounded synthetic learning contexts and a pedagogically grounded decision space, we compare LLM instructional decisions in context-blind and context-aware conditions and quantify their alignment with the pedagogical judgments of subject matter experts. Our results show that, while providing the LC induces systematic, measurable changes in instructional decisions that move LLM policies closer to the subject matter expert policy, substantial misalignment remains. To diagnose this misalignment, we introduce a relevance-impact analysis that reveals which learner characteristics are attended to, ignored, or spuriously influential in LLM instructional decision-making. This analysis, conducted in collaboration with subject matter experts, demonstrates that LC materially shapes LLM instructional planning but does not reliably induce pedagogically appropriate personalization. Our results enable principled evaluation of context-aware LLM systems and provide a foundation for improving personalization through learner characteristic prioritization, pedagogical model tuning, and LC engineering.
Paper Structure (14 sections, 8 equations, 4 figures, 2 tables)

This paper contains 14 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Learning context reduces the personalization gap. Shown are learner-centeredness scores for instructional policies produced by GPT 5.2 in control (learning context-blind) and intervention (learning context-aware) conditions, and by expert instructors across 50 synthetic learners sharing the same learning objective. A learner-centeredness greater than 0 indicates a policy that is more learner-centered while a learner-centeredness less than 0 indicates a policy that is more content-centered. Incorporating learning context consistently increases the learner-centeredness of the LLM's instructional policy relative to an objective-only baseline, but remains substantially below expert instructional planning across all learners.
  • Figure 2: Influence of learning context on instructional policy estimates. Relative frequencies of pedagogical strategies selected by GPT 5.2 under a learning context-blind control condition, a learning context-aware intervention condition, and by expert instructors, for a learner profile characterized by moderate subject interest, high growth mindset, test anxiety, and difficulty persisting under low-interest conditions. Although inclusion of learning context shifts the LLM's instructional policy toward greater learner awareness (e.g., goal setting and monitoring), both LLM conditions remain concentrated on procedural strategies relative to the expert policy, which emphasizes a range of motivational and contextual practices. Pedagogical strategies are derived from the Digital Promise Learner Variability Navigator lvn.
  • Figure 3: Comparison of LLM-expert policy deviation (measured by Total Variation Distance) under control (learning context-blind) and intervention (learning context-aware) conditions across learning contexts. A deviation of 0 indicates perfect alignment with expert instructional judgment. (Main figure) Points below the diagonal indicate reduced deviation from expert instructional judgment when learning context is provided. (Inset) The distribution of deviation reductions shows that every learning context exhibited decreased LLM-expert divergence under the context-aware condition.
  • Figure 4: Relationship between pedagogical relevance of learner characteristics and their observed influence on the LLM's instructional policy (measured via total variation). Characteristics fall into four regions: aligned features (high relevance, high influence), neglected features (high relevance, low influence), hallucinated relevance (low relevance, high influence), and irrelevant features (low relevance, low influence). The presence of neglected and hallucinated characteristics reveals systematic pedagogical misalignment in how learning context is utilized during instructional planning. Quadrants are defined by median influence (horizontal) and moderate relevance (vertical).