Table of Contents
Fetching ...

Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching

Andrew Katz, Mitchell Gerhardt, Michelle Soledad

TL;DR

The paper tackles the challenge of extracting meaningful qualitative themes from large-scale student feedback by introducing the EECS workflow (Extract-Embed-Cluster-Summarize). It demonstrates a fully open-source, locally runnable pipeline that processes 5,000 SETs (4,672 unique comments; 12,046 ideas) into a concise codebook (159 codes) via 272 clusters (232 codes) and a retrieval-augmented generation step, with broader alignment to academic frameworks (APM). The approach reproduces and extends human-coded insights, offering scalable, privacy-preserving qualitative analysis while underscoring the need for human-in-the-loop oversight to ensure code quality and relevance. The work suggests that NLP-based inductive coding can broaden the scope of qualitative analysis in education and beyond, enabling routine analysis of diverse text data such as SETs, essays, and administrative records.

Abstract

Feedback is a critical aspect of improvement. Unfortunately, when there is a lot of feedback from multiple sources, it can be difficult to distill the information into actionable insights. Consider student evaluations of teaching (SETs), which are important sources of feedback for educators. They can give instructors insights into what worked during a semester. A collection of SETs can also be useful to administrators as signals for courses or entire programs. However, on a large scale as in high-enrollment courses or administrative records over several years, the volume of SETs can render them difficult to analyze. In this paper, we discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs). We demonstrate the method by applying it to a corpus of 5,000 SETs from a large public university. We show that the method can be used to extract, embed, cluster, and summarize the SETs to identify the themes they express. More generally, this work illustrates how to use the combination of NLP techniques and LLMs to generate a codebook for SETs. We conclude by discussing the implications of this method for analyzing SETs and other types of student writing in teaching and research settings.

Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching

TL;DR

The paper tackles the challenge of extracting meaningful qualitative themes from large-scale student feedback by introducing the EECS workflow (Extract-Embed-Cluster-Summarize). It demonstrates a fully open-source, locally runnable pipeline that processes 5,000 SETs (4,672 unique comments; 12,046 ideas) into a concise codebook (159 codes) via 272 clusters (232 codes) and a retrieval-augmented generation step, with broader alignment to academic frameworks (APM). The approach reproduces and extends human-coded insights, offering scalable, privacy-preserving qualitative analysis while underscoring the need for human-in-the-loop oversight to ensure code quality and relevance. The work suggests that NLP-based inductive coding can broaden the scope of qualitative analysis in education and beyond, enabling routine analysis of diverse text data such as SETs, essays, and administrative records.

Abstract

Feedback is a critical aspect of improvement. Unfortunately, when there is a lot of feedback from multiple sources, it can be difficult to distill the information into actionable insights. Consider student evaluations of teaching (SETs), which are important sources of feedback for educators. They can give instructors insights into what worked during a semester. A collection of SETs can also be useful to administrators as signals for courses or entire programs. However, on a large scale as in high-enrollment courses or administrative records over several years, the volume of SETs can render them difficult to analyze. In this paper, we discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs). We demonstrate the method by applying it to a corpus of 5,000 SETs from a large public university. We show that the method can be used to extract, embed, cluster, and summarize the SETs to identify the themes they express. More generally, this work illustrates how to use the combination of NLP techniques and LLMs to generate a codebook for SETs. We conclude by discussing the implications of this method for analyzing SETs and other types of student writing in teaching and research settings.
Paper Structure (29 sections, 2 figures, 9 tables)