Table of Contents
Fetching ...

Equity in the Use of ChatGPT for the Classroom: A Comparison of the Accuracy and Precision of ChatGPT 3.5 vs. ChatGPT4 with Respect to Statistics and Data Science Exams

Monnie McGee, Bivin Sadler

TL;DR

The study investigates equity in access to ChatGPT versions by comparing the free ChatGPT3.5 and the paid GPT-4 across four national exams and a graduate exam in statistics/data science. Using paired analyses (McNemar's test) and an ordinal logistic regression model, it demonstrates that GPT-4 substantially outperforms GPT-3.5 overall, with the image-containing questions showing the largest gains. The results imply a meaningful learning disparity tied to subscription status, particularly when visual data is involved, highlighting significant equity and accessibility concerns in AI-assisted education. The work suggests that future advances (e.g., ChatGPT4o) and broader platform comparisons may influence how AI tutoring affects educational equity across disciplines.

Abstract

A college education historically has been seen as method of moving upward with regards to income brackets and social status. Indeed, many colleges recognize this connection and seek to enroll talented low income students. While these students might have their education, books, room, and board paid; there are other items that they might be expected to use that are not part of most college scholarship packages. One of those items that has recently surfaced is access to generative AI platforms. The most popular of these platforms is ChatGPT, and it has a paid version (ChatGPT4) and a free version (ChatGPT3.5). We seek to explore differences in the free and paid versions in the context of homework questions and data analyses as might be seen in a typical introductory statistics course. We determine the extent to which students who cannot afford newer and faster versions of generative AI programs would be disadvantaged in terms of writing such projects and learning these methods.

Equity in the Use of ChatGPT for the Classroom: A Comparison of the Accuracy and Precision of ChatGPT 3.5 vs. ChatGPT4 with Respect to Statistics and Data Science Exams

TL;DR

The study investigates equity in access to ChatGPT versions by comparing the free ChatGPT3.5 and the paid GPT-4 across four national exams and a graduate exam in statistics/data science. Using paired analyses (McNemar's test) and an ordinal logistic regression model, it demonstrates that GPT-4 substantially outperforms GPT-3.5 overall, with the image-containing questions showing the largest gains. The results imply a meaningful learning disparity tied to subscription status, particularly when visual data is involved, highlighting significant equity and accessibility concerns in AI-assisted education. The work suggests that future advances (e.g., ChatGPT4o) and broader platform comparisons may influence how AI tutoring affects educational equity across disciplines.

Abstract

A college education historically has been seen as method of moving upward with regards to income brackets and social status. Indeed, many colleges recognize this connection and seek to enroll talented low income students. While these students might have their education, books, room, and board paid; there are other items that they might be expected to use that are not part of most college scholarship packages. One of those items that has recently surfaced is access to generative AI platforms. The most popular of these platforms is ChatGPT, and it has a paid version (ChatGPT4) and a free version (ChatGPT3.5). We seek to explore differences in the free and paid versions in the context of homework questions and data analyses as might be seen in a typical introductory statistics course. We determine the extent to which students who cannot afford newer and faster versions of generative AI programs would be disadvantaged in terms of writing such projects and learning these methods.

Paper Structure

This paper contains 12 sections, 8 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Scores of students on the methods exam compared to scores from ChatGPT3.5 and ChatGPT4. Each bar represents a student, and the two bars on the far right represent the generative AI platforms. The dashed horizontal line represents the mean of the students' scores excluding scores from generative AI software
  • Figure 2: Spaghetti plot showing concordant and discordant pairs. The lines have been jittered for visibility.
  • Figure 3: Spaghetti plot showing only discordant pairs. The lines have been jittered for visibility.
  • Figure 4: Spaghetti plot showing concordant and discordant pairs. The lines have been jittered for visibility.
  • Figure 5: Spaghetti plot showing only discordant pairs. The lines have been jittered for visibility.
  • ...and 4 more figures