Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles

Momoka Furuhashi; Kouta Nakayama; Noboru Kawai; Takashi Kodama; Saku Sugawara; Kyosuke Takami

Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles

Momoka Furuhashi, Kouta Nakayama, Noboru Kawai, Takashi Kodama, Saku Sugawara, Kyosuke Takami

TL;DR

The paper investigates how fine-grained LLM-generated feedback elements influence learning gains and learner acceptance across learner profiles defined by the Big Five traits. It defines six content-oriented feedback elements, generates feedback with GPT-5 for biology MC items, and tests them in a 321-student high-school experiment with two learning-outcome measures and six subjective criteria, followed by personality-based clustering. The results show that Baseline, Coverage, and Keywords best support revision and learning, while subjective preferences vary across personality clusters, highlighting the value of personality-aware feedback design. The work demonstrates the potential for scalable, adaptive feedback in education by aligning feedback design with learner traits and provides a foundation for further personalization of LLM-assisted instruction.

Abstract

Large language models (LLMs) show promise for automatically generating feedback in education settings. However, it remains unclear how specific feedback elements, such as tone and information coverage, contribute to learning outcomes and learner acceptance, particularly across learners with different personality traits. In this study, we define six feedback elements and generate feedback for multiple-choice biology questions using GPT-5. We conduct a learning experiment with 321 first-year high school students and evaluate feedback effectiveness using two learning outcomes measures and subjective evaluations across six criteria. We further analyze differences in how feedback acceptance varies across learners based on Big Five personality traits. Our results show that effective feedback elements share common patterns supporting learning outcomes, while learners' subjective preferences differ across personality-based clusters. These findings highlight the importance of selecting and adapting feedback elements according to learners' personality traits when we design LLM-generated feedback, and provide practical implications for personalized feedback design in education.

Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles

TL;DR

Abstract

Paper Structure (29 sections, 1 equation, 7 figures)

This paper contains 29 sections, 1 equation, 7 figures.

Introduction
Related Work
Feedback
Feedback Generation
Big Five
Feedback Elements
Baseline
Keywords
Actionability
Novelty
Coverage
Positivity
Experience
Participants
Dataset
...and 14 more sections

Figures (7)

Figure 1: Overview of this study. First, we define six feedback elements related to feedback content and generate feedback using GPT-5. Next, based on data from 321 participants, we analyze learning outcomes using two complementary measures of correctness and subjective evaluations across six criteria. Finally, we examine which feedback elements are more favorably received across learners with different Big Five personality traits.
Figure 2: Overview of the experimental procedure. Participants first provide informed consent and complete a pre-questionnaire. They then answer the task questions and evaluate feedback using a three-point scale across six criteria, such as trustworthiness and expression quality. If they mistake in their responses, they repeat the cycle of answering, reviewing the feedback, and evaluating it until they reach the correct solution.
Figure 3: User interface of the system used in this study. Participants read the question prompt, select an answer option, and submit their response. The system then provides feedback, which participants evaluate using a three-point scale across six criteria. This example shows Positivity.
Figure 4: Accuracy of the first revised attempt. Baseline, Coverage, and Keywords are the most effective methods for supporting answer revision, in this order.
Figure 5: Results of the subjective evaluations for each feedback element. Keywords, Baseline, and Actionability tend to receive higher ratings across many criteria.
...and 2 more figures

Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles

TL;DR

Abstract

Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles

Authors

TL;DR

Abstract

Table of Contents

Figures (7)