Could an Artificial-Intelligence agent pass an introductory physics course?
Gerd Kortemeyer
TL;DR
The paper evaluates whether a state-of-the-art AI language model (ChatGPT, Jan 2023) can pass a calculus-based introductory physics course by solving representative assessments (FCI, homework, clickers, programming, exams) and grading it like a human student. It finds that ChatGPT would narrowly pass the course but harbor persistent novice misconceptions and arithmetic errors, performing exceptionally well only on computational programming tasks while underperforming on conceptual and numerical reasoning. The results raise important questions for physics education about integrity, assessment design, and the skills students must develop to work with AI. The study suggests focusing on metacognition, conceptual understanding, and computation-enabled curricula to prepare learners for AI-enabled environments.
Abstract
Massive pre-trained language models have garnered attention and controversy due to their ability to generate human-like responses: attention due to their frequent indistinguishability from human-generated phraseology and narratives, and controversy due to the fact that their convincingly presented arguments and facts are frequently simply false. Just how human-like are these responses when it comes to dialogues about physics, in particular about the standard content of introductory physics courses? This study explores that question by having ChatGTP, the pre-eminent language model in 2023, work through representative assessment content of an actual calculus-based physics course and grading the responses in the same way human responses would be graded. As it turns out, ChatGPT would narrowly pass this course while exhibiting many of the preconceptions and errors of a beginning learner.
