Pedagogical Alignment of Large Language Models

Shashank Sonkar; Kangqi Ni; Sapana Chaudhary; Richard G. Baraniuk

Pedagogical Alignment of Large Language Models

Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk

TL;DR

Novel perplexity-based metrics that quantify LLMs' tendency to provide scaffolded guidance versus direct answers, offering a robust measure of pedagogical alignment are proposed, providing compelling evidence for the superiority of LHP methods over SFT in optimizing LLMs' behavior.

Abstract

Large Language Models (LLMs), when used in educational settings without pedagogical fine-tuning, often provide immediate answers rather than guiding students through the problem-solving process. This approach falls short of pedagogically best practices and limits their effectiveness as educational tools. We term the objective of training LLMs to emulate effective teaching strategies as `pedagogical alignment.' In this paper, we investigate Learning from Human Preferences (LHP) algorithms to achieve this alignment objective. A key challenge in this process is the scarcity of high-quality preference datasets to guide the alignment. To address this, we propose a novel approach for constructing a large-scale dataset using synthetic data generation techniques, eliminating the need for time-consuming and costly manual annotation. Leveraging this dataset, our experiments with Llama and Mistral models demonstrate that LHP methods outperform standard supervised fine-tuning (SFT), improving pedagogical alignment accuracy by 13.1% and 8.7% respectively. Existing evaluation methods also lack quantitative metrics to adequately measure the pedagogical alignment of LLMs. To address this gap, we propose novel perplexity-based metrics that quantify LLMs' tendency to provide scaffolded guidance versus direct answers, offering a robust measure of pedagogical alignment. Our analysis provides compelling evidence for the superiority of LHP methods over SFT in optimizing LLMs' behavior, underscoring the potential of LHP methods in better aligning LLMs with educational objectives and fostering effective learning experiences. Code and models are available \href{https://github.com/luffycodes/Tutorbot-Spock}{here}.

Pedagogical Alignment of Large Language Models

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 4 figures, 3 tables)

This paper contains 15 sections, 6 equations, 4 figures, 3 tables.

Introduction
Related Work
Algorithms for LHP
Synthetic Student Data Generation
Preference Data Generation for Pedagogical Alignment
Experiments
Dataset and Evaluation
Models and and Training Details
Main Findings: SFT vs LHP
Pedagogical Shifts: Perplexity Comparison of SFT and LHP
Pedagogical Consistency Over Time: SFT vs. LHP in Extended Conversations
Effect of Beta on LHP Algorithms
Conclusion
Limitations
Ethics and Risks

Figures (4)

Figure 1: The image depicts the comparison between a traditional Large Language Model (LLM) interaction (left) and a pedagogically-aligned LLM interaction (right). The traditional LLM directly provides the user with the answer, while the pedagogically-aligned LLM guides the student to the solution by presenting a series of subproblems. This elucidates the concept of pedagogical alignment, emphasizing the transformation from direct problem-solving to a guided, scaffolded learning experience.
Figure 2: This figure shows the process of generating pedagogically-aligned preference data using the CLASS framework. The GPT-student asks a question, to which both the GPT-tutor and SFT-tutor respond. The key focus here is on the divergence in the 'Action Based on Evaluation' between the two tutors. In this example, the GPT-tutor's response is deemed more pedagogically aligned because it encourages the student to engage in critical thinking and attempt the problem again, instead of directly providing the correct answer. This action mismatch between the two tutor responses allows us to construct a preference dataset that distinguishes between the pedagogically preferred (chosen) and less effective (rejected) responses.
Figure 3: Comparison of multi-round performance for SFT vs. LHP methods (DPO, IPO, KTO) across Llama, Mistral, and Zephyr models. The graphs illustrate average accuracy over 8 conversation rounds, revealing the superior performance of LHP methods in maintaining pedagogical alignment across extended conversation context.
Figure 4: Performance of LHP algorithms (DPO, IPO, and KTO) as a function of beta. Our results indicate that KTO outperforms both DPO and IPO with optimal beta hyperparameter search.

Pedagogical Alignment of Large Language Models

TL;DR

Abstract

Pedagogical Alignment of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)