CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at Scale
Ashley Ge Zhang, Xiaohang Tang, Steve Oney, Yan Chen
TL;DR
CFlow tackles the problem of analyzing thousands of student code submissions by introducing a scalable, semantically rich visualization that combines semantic aggregation with code structure. It leverages CodeBERT for embedding-line semantics and an LLM to label line-level errors, presenting results through three synchronized views (SAV, SHV, CDV) and a four-stage algorithm (identify steps, align lines, detect errors, cluster results). Evaluation against a strong baseline shows that CFlow speeds pattern identification, increases accuracy, and enhances the discovery of common mistakes in large classes. The findings support the approach's practical impact for instructors seeking scalable feedback and pattern exploration in scalable CS education settings.
Abstract
The high demand for computer science education has led to high enrollments, with thousands of students in many introductory courses. In such large courses, it can be overwhelmingly difficult for instructors to understand class-wide problem-solving patterns or issues, which is crucial for improving instruction and addressing important pedagogical challenges. In this paper, we propose a technique and system, CFlow, for creating understandable and navigable representations of code at scale. CFlow is able to represent thousands of code samples in a visualization that resembles a single code sample. CFlow creates scalable code representations by (1) clustering individual statements with similar semantic purposes, (2) presenting clustered statements in a way that maintains semantic relationships between statements, (3) representing the correctness of different variations as a histogram, and (4) allowing users to navigate through solutions interactively using semantic filters. With a multi-level view design, users can navigate high-level patterns, and low-level implementations. This is in contrast to prior tools that either limit their focus on isolated statements (and thus discard the surrounding context of those statements) or cluster entire code samples (which can lead to large numbers of clusters -- for example, if there are n code features and m implementations of each, there can be m^n clusters). We evaluated the effectiveness of CFlow with a comparison study, found participants using CFlow spent only half the time identifying mistakes and recalled twice as many desired patterns from over 6,000 submissions.
