Table of Contents
Fetching ...

The Robots are Here: Navigating the Generative AI Revolution in Computing Education

James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, Jaromir Savelka

TL;DR

This ITiCSE Working Group article examines how large language models are reshaping computing education, reviewing 71 primary works, surveying 228 educators and students across 20 countries, and conducting in-depth interviews with 22 educators. It categorizes literature into five roles for LLMs and highlights that current models often rival or surpass average students on coding tasks, while posing risks around learning integrity, overreliance, and equity. The authors also benchmark LLMs on education-focused datasets (APPS, FalconCode) to assess progress, finding that newer models like GPT-4 significantly outperform earlier results, though dataset quality and task framing heavily influence outcomes. They offer practical guidance on curriculum design, assessment, ethics, and policy, advocating constructive alignment, explicit rules for GenAI usage, and replication-driven research to inform practice and policy in computing classrooms.

Abstract

Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.

The Robots are Here: Navigating the Generative AI Revolution in Computing Education

TL;DR

This ITiCSE Working Group article examines how large language models are reshaping computing education, reviewing 71 primary works, surveying 228 educators and students across 20 countries, and conducting in-depth interviews with 22 educators. It categorizes literature into five roles for LLMs and highlights that current models often rival or surpass average students on coding tasks, while posing risks around learning integrity, overreliance, and equity. The authors also benchmark LLMs on education-focused datasets (APPS, FalconCode) to assess progress, finding that newer models like GPT-4 significantly outperform earlier results, though dataset quality and task framing heavily influence outcomes. They offer practical guidance on curriculum design, assessment, ethics, and policy, advocating constructive alignment, explicit rules for GenAI usage, and replication-driven research to inform practice and policy in computing classrooms.

Abstract

Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.
Paper Structure (99 sections, 4 figures, 12 tables)

This paper contains 99 sections, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Phases of the literature review.
  • Figure 2: Summaries of the survey responses from 171 students and 57 instructors: 1) Students' and instructors' perspectives were compared along likert scale responses, 2) students ranked their help seeking preferences from 1 to 6, and 3) instructors shared their beliefs about the ethical use of Generative AI Tools.
  • Figure 3: A comparison of the original results and the score achieved by GPT-4 on the two CS1 tests and Rainfall-problem variants presented in finnieansley2022robots.
  • Figure 4: GPT success rate for different exercise types.