LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions

Lars Krupp; Jonas Bley; Isacco Gobbi; Alexander Geng; Sabine Müller; Sungho Suh; Ali Moghiseh; Arcesio Castaneda Medina; Valeria Bartsch; Artur Widera; Herwig Ott; Paul Lukowicz; Jakob Karolus; Maximilian Kiefer-Emmanouilidis

LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions

Lars Krupp, Jonas Bley, Isacco Gobbi, Alexander Geng, Sabine Müller, Sungho Suh, Ali Moghiseh, Arcesio Castaneda Medina, Valeria Bartsch, Artur Widera, Herwig Ott, Paul Lukowicz, Jakob Karolus, Maximilian Kiefer-Emmanouilidis

TL;DR

The study addresses the shortage of quantum computing educators by evaluating whether LLM-generated tips can match expert-created tips for teaching QC basics. It conducts two complementary studies: a main between-subject study (N=46) comparing tip creator and label, and a tip-evaluation study (N=23) with educators and students to assess tip quality, correctness, and impact on learning. Findings show LLM-generated tips rival expert tips in usefulness and conceptual focus, with tips labeled as llm-generated sometimes boosting performance via placebo effects, while also risking giving away the answer and being more verbose. The work supports integrating LLM-generated tips into scalable, personalized education for QC basics, provided that design considerations, validation, and human-in-the-loop checks are in place to mitigate risks and preserve learning gains.

Abstract

Individual teaching is among the most successful ways to impart knowledge. Yet, this method is not always feasible due to large numbers of students per educator. Quantum computing serves as a prime example facing this issue, due to the hype surrounding it. Alleviating high workloads for teachers, often accompanied with individual teaching, is crucial for continuous high quality education. Therefore, leveraging Large Language Models (LLMs) such as GPT-4 to generate educational content can be valuable. We conducted two complementary studies exploring the feasibility of using GPT-4 to automatically generate tips for students. In the first one students (N=46) solved four multiple-choice quantum computing questions with either the help of expert-created or LLM-generated tips. To correct for possible biases towards LLMs, we introduced two additional conditions, making some participants believe that they were given expert-created tips, when they were given LLM-generated tips and vice versa. Our second study (N=23) aimed to directly compare the LLM-generated and expert-created tips, evaluating their quality, correctness and helpfulness, with both experienced educators and students participating. Participants in our second study found that the LLM-generated tips were significantly more helpful and pointed better towards relevant concepts than the expert-created tips, while being more prone to be giving away the answer. While participants in the first study performed significantly better in answering the quantum computing questions when given tips labeled as LLM-generated, even if they were created by an expert. This phenomenon could be a placebo effect induced by the participants' biases for LLM-generated content. Ultimately, we find that LLM-generated tips are good enough to be used instead of expert tips in the context of quantum computing basics.

LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions

TL;DR

Abstract

Paper Structure (49 sections, 11 figures, 10 tables)

This paper contains 49 sections, 11 figures, 10 tables.

Introduction
Methods
RQ1: Can llm-generated tips be used instead of expert-created tips to help students answer quantum-physics questions?
RQ2: What are adverse effects of llm-generated tips?
Multiple-Choice Question Creation
Tip Creation
Main Study
Main Study Procedure
Main Study Participants
Tip Evaluation Study
Tip Evaluation Study Procedure
Tip Evaluation Study Participants
Results
Main Study Results
Custom questions
...and 34 more sections

Figures (11)

Figure 1: A screenshot of how tips were displayed in the main study.
Figure 2: The complete main study procedure, detailing each step.
Figure 3: Two boxplots depicting the score achieved by main study participants dependent on creator and label of the provided tip (left) and the score achieved by main study participants dependent on only the label (right). Significant differences are marked with *.
Figure 4: Boxplot with the scores of the custom questions (\ref{['tab:main_custom_questions']}). The questions were rated on a visual analogue scale with values in a range from 0 to 100. The conditions are given as creator-label pairs.
Figure 5: Raincloud plot showing the differences between llm-generated and expert-created tips for the custom questions (\ref{['tab:tip_eval_custom_questions']}). The raincloud plot consists of point clouds depicting the ratings given by participants', the boxplots and violin plots showing the distribution of ratings. All measures were rated on a visual analogue scale in a range from 0 to 100.
...and 6 more figures

LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions

TL;DR

Abstract

LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions

Authors

TL;DR

Abstract

Table of Contents

Figures (11)