How Large Language Models Are Changing MOOC Essay Answers: A Comparison of Pre- and Post-LLM Responses
Leo Leppänen, Lili Aunimo, Arto Hellas, Jukka K. Nurminen, Linda Mannila
TL;DR
This paper investigates how student essays in a free AI ethics MOOC changed after ChatGPT's release, using a longitudinal dataset from 2020–2024. It analyzes style, length, readability, vocabulary, and topics to compare pre- and post-ChatGPT writing, employing token/ sentence measures, Flesch scores, Type-Token Ratios, term prevalence, and both normal and dynamic topic models. The results show post-ChatGPT essays are longer, easier to read, and embed more LLM-related terminology, with reduced lexical diversity and no broad shifts in topics, suggesting substantial LLM-assisted writing among participants. These findings raise important questions about the value of MOOC certificates and motivate consideration of integrity-preserving measures in online education.
Abstract
The release of ChatGPT in late 2022 caused a flurry of activity and concern in the academic and educational communities. Some see the tool's ability to generate human-like text that passes at least cursory inspections for factual accuracy ``often enough'' a golden age of information retrieval and computer-assisted learning. Some, on the other hand, worry the tool may lead to unprecedented levels of academic dishonesty and cheating. In this work, we quantify some of the effects of the emergence of Large Language Models (LLMs) on online education by analyzing a multi-year dataset of student essay responses from a free university-level MOOC on AI ethics. Our dataset includes essays submitted both before and after ChatGPT's release. We find that the launch of ChatGPT coincided with significant changes in both the length and style of student essays, mirroring observations in other contexts such as academic publishing. We also observe -- as expected based on related public discourse -- changes in prevalence of key content words related to AI and LLMs, but not necessarily the general themes or topics discussed in the student essays as identified through (dynamic) topic modeling.
