Learning Context Matters: Measuring and Diagnosing Personalization Gaps in LLM-Based Instructional Design
Johaun Hatchett, Debshila Basu Mallick, Brittany C. Bradford, Richard G. Baraniuk
TL;DR
This paper tackles whether providing Learning Context (LC) to LLM-based tutors yields truly personalized instruction. It introduces the Personalization Policy Probe (P$^3$), a framework that generates psychometrically grounded LC probes, formalizes instructional design as a policy, and uses expert judgments to measure alignment and learner-centeredness via metrics like Total Variation Distance and policy expectations. Case studies with GPT-5.2 show that LC nudges instructional planning toward learner-centered strategies and closer to expert policies, but substantial gaps and misalignment remain, including ignored or spuriously influential learner features. The work provides a principled basis for evaluating LC-aware tutoring and motivates targeted improvements in learner prioritization, pedagogical model tuning, and LC engineering to achieve expert-like personalization at scale.
Abstract
The adoption of generative AI in education has accelerated dramatically in recent years, with Large Language Models (LLMs) increasingly integrated into learning environments in the hope of providing personalized support that enhances learner engagement and knowledge retention. However, truly personalized support requires access to meaningful Learning Context (LC) regarding who the learner is, what they are trying to understand, and how they are engaging with the material. In this paper, we present a framework for measuring and diagnosing how the LC influences instructional strategy selection in LLM-based tutoring systems. Using psychometrically grounded synthetic learning contexts and a pedagogically grounded decision space, we compare LLM instructional decisions in context-blind and context-aware conditions and quantify their alignment with the pedagogical judgments of subject matter experts. Our results show that, while providing the LC induces systematic, measurable changes in instructional decisions that move LLM policies closer to the subject matter expert policy, substantial misalignment remains. To diagnose this misalignment, we introduce a relevance-impact analysis that reveals which learner characteristics are attended to, ignored, or spuriously influential in LLM instructional decision-making. This analysis, conducted in collaboration with subject matter experts, demonstrates that LC materially shapes LLM instructional planning but does not reliably induce pedagogically appropriate personalization. Our results enable principled evaluation of context-aware LLM systems and provide a foundation for improving personalization through learner characteristic prioritization, pedagogical model tuning, and LC engineering.
