Table of Contents
Fetching ...

From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

Unggi Lee, Jiyeong Bae, Yeonji Jung, Minji Kang, Gyuri Byun, Yeonseo Lee, Dohee Kim, Sookbun Lee, Jaekwon Park, Taekyung Ahn, Gunho Lee, Hyeoncheol Kim

TL;DR

This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing to programming education that leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models.

Abstract

Knowledge Tracing (KT) is a critical component in online learning, but traditional approaches face limitations in interpretability and cross-domain adaptability. This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing (LKT) to programming education. CodeLKT leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models. We explore Domain Adaptive Pre-Training (DAPT) and Task Adaptive Pre-Training (TAPT), showing enhanced performance in the coding domain and investigating cross-domain transfer between mathematics and coding. Additionally, we present an theoretically-informed integrated system combining CodeLKT with large language models to generate personalized, in-depth feedback to support students' programming learning. This work advances the field of Code Knowledge Tracing by expanding the knowledge base with language model-based approach and offering practical implications for programming education through data-informed feedback.

From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education

TL;DR

This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing to programming education that leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models.

Abstract

Knowledge Tracing (KT) is a critical component in online learning, but traditional approaches face limitations in interpretability and cross-domain adaptability. This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing (LKT) to programming education. CodeLKT leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models. We explore Domain Adaptive Pre-Training (DAPT) and Task Adaptive Pre-Training (TAPT), showing enhanced performance in the coding domain and investigating cross-domain transfer between mathematics and coding. Additionally, we present an theoretically-informed integrated system combining CodeLKT with large language models to generate personalized, in-depth feedback to support students' programming learning. This work advances the field of Code Knowledge Tracing by expanding the knowledge base with language model-based approach and offering practical implications for programming education through data-informed feedback.
Paper Structure (23 sections, 6 equations, 2 figures, 12 tables)

This paper contains 23 sections, 6 equations, 2 figures, 12 tables.

Figures (2)

  • Figure 1: The prompt template for correctness and hint feedback consist of 7 and 4 components each; since hint feedback is given in the case that the student did not submit the answer, it does not contains 'Correctness', 'Student Code (Present)', and 'Student Code AST' components, while correctness feedback contains those. The correctness feedback provides answer correction when the student submitted wrong answer, and provides 'Tips for improvement' and 'Next challenge' in the opposite case. The both cases of correctness feedback contains 'Positive feedback', 'Answer analysis' and 'Comments for cheering up'. The hint feedback contains 'Positive feedback', 'Related past history', 'Similar problems' and 'Key notions of the problem'.
  • Figure 2: Pipeline to extract Question, Concept information for LKT from CSEDM-19-Spring, CSEDM-19-Fall, CodeWorkout-Spring2019 datasets. A model trained on the domain corpus uses this data to predict MASK.