Table of Contents
Fetching ...

Can ChatGPT Pass a Theory of Computing Course?

Matei A. Golesteanu, Garrett B. Vowinkel, Ryan E. Dougherty

TL;DR

This study evaluates ChatGPT-4 on a Theory of Computation (ToC) course through two experiments: passing the institution's exams and answering a 450-question ToC dataset across core topics. It finds that ChatGPT can achieve exam-like performance with averages around a B range, improving with retries, but struggles with open-ended proofs and questions outside its training data; prompt engineering can noticeably boost performance, especially on multiple-choice items. The work provides practical implications for course design and academic integrity, suggesting closed-book exams as core assessments while using LLM-based tasks to foster critical thinking and error detection. It also outlines future directions such as testing other LLMs, expanding topic coverage, and developing ToC-tailored tutoring tools for computation theory.

Abstract

Large Language Models (LLMs) have had considerable difficulty when prompted with mathematical questions, especially those within theory of computing (ToC) courses. In this paper, we detail two experiments regarding our own ToC course and the ChatGPT LLM. For the first, we evaluated ChatGPT's ability to pass our own ToC course's exams. For the second, we created a database of sample ToC questions and responses to accommodate other ToC offerings' choices for topics and structure. We scored each of ChatGPT's outputs on these questions. Overall, we determined that ChatGPT can pass our ToC course, and is adequate at understanding common formal definitions and answering "simple"-style questions, e.g., true/false and multiple choice. However, ChatGPT often makes nonsensical claims in open-ended responses, such as proofs.

Can ChatGPT Pass a Theory of Computing Course?

TL;DR

This study evaluates ChatGPT-4 on a Theory of Computation (ToC) course through two experiments: passing the institution's exams and answering a 450-question ToC dataset across core topics. It finds that ChatGPT can achieve exam-like performance with averages around a B range, improving with retries, but struggles with open-ended proofs and questions outside its training data; prompt engineering can noticeably boost performance, especially on multiple-choice items. The work provides practical implications for course design and academic integrity, suggesting closed-book exams as core assessments while using LLM-based tasks to foster critical thinking and error detection. It also outlines future directions such as testing other LLMs, expanding topic coverage, and developing ToC-tailored tutoring tools for computation theory.

Abstract

Large Language Models (LLMs) have had considerable difficulty when prompted with mathematical questions, especially those within theory of computing (ToC) courses. In this paper, we detail two experiments regarding our own ToC course and the ChatGPT LLM. For the first, we evaluated ChatGPT's ability to pass our own ToC course's exams. For the second, we created a database of sample ToC questions and responses to accommodate other ToC offerings' choices for topics and structure. We scored each of ChatGPT's outputs on these questions. Overall, we determined that ChatGPT can pass our ToC course, and is adequate at understanding common formal definitions and answering "simple"-style questions, e.g., true/false and multiple choice. However, ChatGPT often makes nonsensical claims in open-ended responses, such as proofs.
Paper Structure (14 sections, 1 figure, 3 tables)