From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

Unggi Lee; Jiyeong Bae; Yeonji Jung; Minji Kang; Gyuri Byun; Yeonseo Lee; Dohee Kim; Sookbun Lee; Jaekwon Park; Taekyung Ahn; Gunho Lee; Hyeoncheol Kim

From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

Unggi Lee, Jiyeong Bae, Yeonji Jung, Minji Kang, Gyuri Byun, Yeonseo Lee, Dohee Kim, Sookbun Lee, Jaekwon Park, Taekyung Ahn, Gunho Lee, Hyeoncheol Kim

TL;DR

This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing to programming education that leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models.

Abstract

Knowledge Tracing (KT) is a critical component in online learning, but traditional approaches face limitations in interpretability and cross-domain adaptability. This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing (LKT) to programming education. CodeLKT leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models. We explore Domain Adaptive Pre-Training (DAPT) and Task Adaptive Pre-Training (TAPT), showing enhanced performance in the coding domain and investigating cross-domain transfer between mathematics and coding. Additionally, we present an theoretically-informed integrated system combining CodeLKT with large language models to generate personalized, in-depth feedback to support students' programming learning. This work advances the field of Code Knowledge Tracing by expanding the knowledge base with language model-based approach and offering practical implications for programming education through data-informed feedback.

From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

TL;DR

Abstract

Paper Structure (23 sections, 6 equations, 2 figures, 12 tables)

This paper contains 23 sections, 6 equations, 2 figures, 12 tables.

Introduction
Related Work
Code Knowledge Tracing
Domain Adaptative Pre-Training in Knowledge Tracing
Automatic Feedback System for Programming Education
Method
Code Language Model-based Knowledge Tracing
Problem Definition
Language Model-based Code Knowledge Tracing
Textual Feature Extraction for Code Knowledge Tracing
Datasets
Generate Questions from Answers
Create Knowledge Concept Information
Domain Adaptation
Experiment Setup
...and 8 more sections

Figures (2)

Figure 1: The prompt template for correctness and hint feedback consist of 7 and 4 components each; since hint feedback is given in the case that the student did not submit the answer, it does not contains 'Correctness', 'Student Code (Present)', and 'Student Code AST' components, while correctness feedback contains those. The correctness feedback provides answer correction when the student submitted wrong answer, and provides 'Tips for improvement' and 'Next challenge' in the opposite case. The both cases of correctness feedback contains 'Positive feedback', 'Answer analysis' and 'Comments for cheering up'. The hint feedback contains 'Positive feedback', 'Related past history', 'Similar problems' and 'Key notions of the problem'.
Figure 2: Pipeline to extract Question, Concept information for LKT from CSEDM-19-Spring, CSEDM-19-Fall, CodeWorkout-Spring2019 datasets. A model trained on the domain corpus uses this data to predict MASK.

From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

TL;DR

Abstract

From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

Authors

TL;DR

Abstract

Table of Contents

Figures (2)