Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

Harsh Kumar; Ruiwei Xiao; Benjamin Lawson; Ilya Musabirov; Jiakai Shi; Xinyuan Wang; Huayin Luo; Joseph Jay Williams; Anna Rafferty; John Stamper; Michael Liut

Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

Harsh Kumar, Ruiwei Xiao, Benjamin Lawson, Ilya Musabirov, Jiakai Shi, Xinyuan Wang, Huayin Luo, Joseph Jay Williams, Anna Rafferty, John Stamper, Michael Liut

TL;DR

This work highlights that focusing solely on the accuracy of LLMs can overlook their potential to enhance metacognitive skills through practices such as self-reflection, and underscores the utility of LLM-guided reflection and questionnaire-based activities in improving learning outcomes.

Abstract

Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mitigate these limitations. In this paper, we conducted two randomized field experiments in undergraduate computer science courses to investigate the potential of LLMs to help students engage in post-lesson reflection. In the first experiment (N=145), students completed a take-home assignment with the support of an LLM assistant; half of these students were then provided access to an LLM designed to facilitate self-reflection. The results indicated that the students assigned to LLM-guided reflection reported increased self-confidence and performed better on a subsequent exam two weeks later than their peers in the control condition. In the second experiment (N=112), we evaluated the impact of LLM-guided self-reflection against other scalable reflection methods, such as questionnaire-based activities and review of key lecture slides, after assignment. Our findings suggest that the students in the questionnaire and LLM-based reflection groups performed equally well and better than those who were only exposed to lecture slides, according to their scores on a proctored exam two weeks later on the same subject matter. These results underscore the utility of LLM-guided reflection and questionnaire-based activities in improving learning outcomes. Our work highlights that focusing solely on the accuracy of LLMs can overlook their potential to enhance metacognitive skills through practices such as self-reflection. We discuss the implications of our research for the Edtech community.

Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

TL;DR

Abstract

Paper Structure (34 sections, 6 figures)

This paper contains 34 sections, 6 figures.

Introduction
Related Work
Role of Reflection in Education
LLMs in Classroom
Study-1
Experimental Design
Domain and Stimuli
Participants
Analysis
Results
Impact on Performance and Learning
Impact on Students' Self-Confidence in the Subject
Subjective Ratings for Helpfulness and Willingness to Interact Again
Common Themes Across LLM-Student Interactions
Positive Feedback and Affirmations
...and 19 more sections

Figures (6)

Figure 1: Stimuli for Study-1. A) Half of the students got link to the reflection bot after completing their assignment. B) Example chat window for reflection.
Figure 2: Comparative Analysis of Student Outcomes in Reflection vs. No-Reflection Conditions for Study 1. The left panel presents the mean final exam scores obtained two weeks post-assignment, indicating higher performance among students in the reflection group. The center panel assesses the perceived helpfulness of the LLM-tutor for other topics, as influenced by the assigned condition. Finally, the right panel evaluates the willingness of the students to interact again with the LLM tutor, highlighting a greater inclination among those in the reflection group to seek further interaction. Error bars represent standard errors.
Figure 3: Change in students' self-confidence from the beginning to the end of the assignment based on their engagement in reflection activities. Error bars represent standard error.
Figure 4: Non-LLM Interfaces Used for Study-2. The LLM-based reflection (Condition-2) looked similar to the interface in Study-1 (Figure \ref{['fig:quickta_combined']}).
Figure 5: Average score of students on the assignment topic, in a proctored exam, conducted 2 weeks after the assignment. In total, 112 students participated in the optional reflection exercise and were uniformly distributed across conditions. We find that students who engaged in a questionnaire-based and LLM-based reflection exercise seemed to perform better than students in the revision of the important slides condition.
...and 1 more figures

Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

TL;DR

Abstract

Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

Authors

TL;DR

Abstract

Table of Contents

Figures (6)