Table of Contents
Fetching ...

SPHERE: Scaling Personalized Feedback in Programming Classrooms with Structured Review of LLM Outputs

Xiaohang Tang, Sam Wong, Marcus Huynh, Zicheng He, Yalong Yang, Yan Chen

TL;DR

SPHERE is an interactive system that leverages Large Language Models and structured LLM output review to scale personalized feedback for in-class coding activities, addressing the challenges of real-time response, issue prioritization, and large-scale personalization.

Abstract

Effective personalized feedback is crucial for learning programming. However, providing personalized, real-time feedback in large programming classrooms poses significant challenges for instructors. This paper introduces SPHERE, an interactive system that leverages Large Language Models (LLMs) and structured LLM output review to scale personalized feedback for in-class coding activities. SPHERE employs two key components: an Issue Recommendation Component that identifies critical patterns in students' code and discussion, and a Feedback Review Component that uses a ``strategy-detail-verify'' approach for efficient feedback creation and verification. An in-lab, between-subject study demonstrates SPHERE's effectiveness in improving feedback quality and the overall feedback review process compared to a baseline system using off-the-shelf LLM outputs. This work contributes a novel approach to scaling personalized feedback in programming education, addressing the challenges of real-time response, issue prioritization, and large-scale personalization.

SPHERE: Scaling Personalized Feedback in Programming Classrooms with Structured Review of LLM Outputs

TL;DR

SPHERE is an interactive system that leverages Large Language Models and structured LLM output review to scale personalized feedback for in-class coding activities, addressing the challenges of real-time response, issue prioritization, and large-scale personalization.

Abstract

Effective personalized feedback is crucial for learning programming. However, providing personalized, real-time feedback in large programming classrooms poses significant challenges for instructors. This paper introduces SPHERE, an interactive system that leverages Large Language Models (LLMs) and structured LLM output review to scale personalized feedback for in-class coding activities. SPHERE employs two key components: an Issue Recommendation Component that identifies critical patterns in students' code and discussion, and a Feedback Review Component that uses a ``strategy-detail-verify'' approach for efficient feedback creation and verification. An in-lab, between-subject study demonstrates SPHERE's effectiveness in improving feedback quality and the overall feedback review process compared to a baseline system using off-the-shelf LLM outputs. This work contributes a novel approach to scaling personalized feedback in programming education, addressing the challenges of real-time response, issue prioritization, and large-scale personalization.

Paper Structure

This paper contains 48 sections, 14 figures, 4 tables.

Figures (14)

  • Figure 1: SPHERE's Workflow Overview. Once students' conversation logs, code history, and code errors come in (1), SPHERE continuously identifies critical issues and recommends them to the instructors (2). Instructors select the critical issues for feedback, which are then summarized and categorized to create Feedback Templates (3). These templates are previewed by instructors and further clustered (4) with relevant Code Evidence (5) and Conversation Evidence (6) to provide context and support a rapid verification process. This results in personalized feedback being sent to each student (7).
  • Figure 2: Critical Issue Recommendation Panel consists of Scatterplot, Critical Issue View and Detailed Issue. (1) Scatterplot changes based on group view. Critical Issue component consists of (2) Name of the Critical Issue, (3) Description of the Critical Issue, (4) Top 2 sub issue, (7) Number of Students with this critical issue, (8) Average passrate of students with this critical issue. Instructors can (5) note down and save the critical issue or (9) give feedback to the critical issue. Instructors can sort critical issue based on pass rate, number of students or severity. (10)Selecting a critical issue would display example submission and (11)highlight relevant datapoints.
  • Figure 3: User Interface for creating and reviewing LLM-generated feedback. (1) Instructor can select different feedback types, which will select a subset of (2)feedback components. (3)Instructors can then generate feedback for preview.
  • Figure 4: Feedbacks are generated for each sub issue, which includes the name of the sub issue (4), the number of students in the sub issue(1), the feedback template with components as placeholders (2) and component description (3). Instructors are able to edit the template (5). Feedback are then generated in clusters that includes code evidence (9) and conversation evidence (10). Feedback for each components are highlighted with their corresponding colors (6). Instructors can choose to edit or send individual feedback (7, 8).
  • Figure 5: User Interface for annotator tool. Annotators select an activity from each student (Left) and provide a label by reviewing the playback for the selected activity.
  • ...and 9 more figures