AI Mentors for Student Projects: Spotting Early Issues in Computer Science Proposals
Gati Aher, Robin Schmucker, Tom Mitchell, Zachary C. Lipton
TL;DR
This paper tackles early spotting of issues in CS project proposals for project-based learning by building a software system that collects student proposals and aptitude signals. The system uses a 29-item rubric to have two human experts and GPT-4o rate proposals, enabling analysis of expert–LLM agreement and the ability to identify students who may need more guidance. Findings indicate GPT-4o can approximate expert judgments and that experienced students tend to produce higher-quality proposals, while novices receive lower ratings; motivation to use the system remains high. The work suggests LLM-based grading could scale readiness assessments for PBL but emphasizes the need for reliable predictive indicators of learner success and careful consideration of equity, cost, and generalizability to classroom settings.
Abstract
When executed well, project-based learning (PBL) engages students' intrinsic motivation, encourages students to learn far beyond a course's limited curriculum, and prepares students to think critically and maturely about the skills and tools at their disposal. However, educators experience mixed results when using PBL in their classrooms: some students thrive with minimal guidance and others flounder. Early evaluation of project proposals could help educators determine which students need more support, yet evaluating project proposals and student aptitude is time-consuming and difficult to scale. In this work, we design, implement, and conduct an initial user study (n = 36) for a software system that collects project proposals and aptitude information to support educators in determining whether a student is ready to engage with PBL. We find that (1) users perceived the system as helpful for writing project proposals and identifying tools and technologies to learn more about, (2) educator ratings indicate that users with less technical experience in the project topic tend to write lower-quality project proposals, and (3) GPT-4o's ratings show agreement with educator ratings. While the prospect of using LLMs to rate the quality of students' project proposals is promising, its long-term effectiveness strongly hinges on future efforts at characterizing indicators that reliably predict students' success and motivation to learn.
