Table of Contents
Fetching ...

Improving Socratic Question Generation using Data Augmentation and Preference Optimization

Nischal Ashok Kumar, Andrew Lan

TL;DR

This work tackles the difficulty of generating valid Socratic questions with LLMs by introducing a two-phase framework: data augmentation to produce realistic invalid questions, and preference optimization to align an open-source model via Direct Preference Optimization. By augmenting a code-debugging dialogue dataset with category-diverse negatives and training with a ground-truth preference signal, the authors show that a 7B LLama model fine-tuned with DPO can avoid invalid questions and outperform larger proprietary models on Rouge-L and approach GPT-4 quality on related metrics. The method demonstrates that open-source, privacy-preserving models can deliver high-quality Socratic questioning suitable for programming education, with broader implications for scalable tutoring and adaptive feedback. The work also provides detailed ablations, case studies, and a clear path for extending to larger models, more nuanced invalid-question types, and human-in-the-loop evaluation to further validate educational impact.

Abstract

The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.

Improving Socratic Question Generation using Data Augmentation and Preference Optimization

TL;DR

This work tackles the difficulty of generating valid Socratic questions with LLMs by introducing a two-phase framework: data augmentation to produce realistic invalid questions, and preference optimization to align an open-source model via Direct Preference Optimization. By augmenting a code-debugging dialogue dataset with category-diverse negatives and training with a ground-truth preference signal, the authors show that a 7B LLama model fine-tuned with DPO can avoid invalid questions and outperform larger proprietary models on Rouge-L and approach GPT-4 quality on related metrics. The method demonstrates that open-source, privacy-preserving models can deliver high-quality Socratic questioning suitable for programming education, with broader implications for scalable tutoring and adaptive feedback. The work also provides detailed ablations, case studies, and a clear path for extending to larger models, more nuanced invalid-question types, and human-in-the-loop evaluation to further validate educational impact.

Abstract

The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.
Paper Structure (16 sections, 2 equations, 1 figure, 6 tables)

This paper contains 16 sections, 2 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Illustration of our method for LLM-based Socratic question generation, which consists of two phases, data augmentation, and preference optimization.