Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

Nischal Ashok Kumar; Andrew Lan

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

Nischal Ashok Kumar, Andrew Lan

TL;DR

The paper addresses scalable assessment in CS education by automatically generating test cases using a large language model guided by representative student code and a compiler feedback loop. Its core method, an iterative refinement framework with Code Pair Prompting, Code Pair Selection, and compiler-driven feedback, aims to produce test sets that meaningfully measure student knowledge, with a formal notion that $P = Q \cdot s_i$ governs the targeted passing ratio for buggy code in prompts. The approach is evaluated on the CSEDM challenge dataset with Java problems, reporting varying accuracy across data types and highlighting how diversity in student bugs affects coverage. The work demonstrates potential for scalable, automated assessment and actionable avenues for future work, including adaptive testing and personalized feedback to support novice programmers.

Abstract

In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal of our work is to propose a fully automated approach for test case generation that can accurately measure student knowledge, which is important for two reasons. First, manually constructing test cases requires expert knowledge and is a labor-intensive process. Second, developing test cases for students, especially those who are novice programmers, is significantly different from those oriented toward professional-level software developers. Therefore, we need an automated process for test case generation to assess student knowledge and provide feedback. In this work, we propose a large language model-based approach to automatically generate test cases and show that they are good measures of student knowledge, using a publicly available dataset that contains student-written Java code. We also discuss future research directions centered on using test cases to help students.

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

TL;DR

governs the targeted passing ratio for buggy code in prompts. The approach is evaluated on the CSEDM challenge dataset with Java problems, reporting varying accuracy across data types and highlighting how diversity in student bugs affects coverage. The work demonstrates potential for scalable, automated assessment and actionable avenues for future work, including adaptive testing and personalized feedback to support novice programmers.

Abstract

Paper Structure (14 sections, 3 equations, 1 figure, 3 tables)

This paper contains 14 sections, 3 equations, 1 figure, 3 tables.

Introduction
Related Work
Test Case Generation Methodology
Code Pair Prompting
Code Pair Selection
Compiler Feedback and Iterative Refinement
Experiments
Datasets, Setup, and Metrics
Results and Discussion
Analysis
Qualitative Example
Conclusions and Future Work
Acknowledgements
Iterative Prompt Engineering

Figures (1)

Figure 1: Visualizing our overall approach for test case generation. TC stands for test case.

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

TL;DR

Abstract

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

Authors

TL;DR

Abstract

Table of Contents

Figures (1)