From Coders to Critics: Empowering Students through Peer Assessment in the Age of AI Copilots
Santiago Berrezueta-Guzman, Stephan Krusche, Stefan Wagner
TL;DR
The paper investigates how rubric-based, anonymized peer assessment can approximate instructor grading in a large introductory programming course amidst AI copilots. Using 47 teams, it combines rubric-guided peer reviews with instructor scores and reflexive surveys, reporting moderate alignment (peer–instructor correlation around $r \approx 0.5$–$0.55$; $MAE$ ~ $9.18$–$10.68$; $RMSE$ ~ $14.87$–$16.37$) and high student engagement. The findings suggest that structured peer assessment can be a scalable, trustworthy complement to instructor feedback, promoting evaluative thinking and fairness perceptions in an era of AI-assisted coding. The study offers design implications for peer evaluation systems and outlines directions for training, calibration, and broader deployment across software engineering education in AI-augmented environments.
Abstract
The rapid adoption of AI powered coding assistants like ChatGPT and other coding copilots is transforming programming education, raising questions about assessment practices, academic integrity, and skill development. As educators seek alternatives to traditional grading methods susceptible to AI enabled plagiarism, structured peer assessment could be a promising strategy. This paper presents an empirical study of a rubric based, anonymized peer review process implemented in a large introductory programming course. Students evaluated each other's final projects (2D game), and their assessments were compared to instructor grades using correlation, mean absolute error, and root mean square error (RMSE). Additionally, reflective surveys from 47 teams captured student perceptions of fairness, grading behavior, and preferences regarding grade aggregation. Results show that peer review can approximate instructor evaluation with moderate accuracy and foster student engagement, evaluative thinking, and interest in providing good feedback to their peers. We discuss these findings for designing scalable, trustworthy peer assessment systems to face the age of AI assisted coding.
