Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation

Zifei FeiFei Han; Jionghao Lin; Ashish Gurung; Danielle R. Thomas; Eason Chen; Conrad Borchers; Shivang Gupta; Kenneth R. Koedinger

Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation

Zifei FeiFei Han, Jionghao Lin, Ashish Gurung, Danielle R. Thomas, Eason Chen, Conrad Borchers, Shivang Gupta, Kenneth R. Koedinger

TL;DR

The paper addresses the challenge of assessing novice tutors' social-emotional learning (SEL) competencies in one-on-one math tutoring. It investigates four prompting strategies for GPT-3.5 and GPT-4, including a Retrieval-Augmented Generation (RAG) approach that leverages an embeddings-based information base of transcripts and SEL principles. Results show that RAG prompts yield higher accuracy with fewer hallucinations, and GPT-4 generally outperforms GPT-3.5, while RAG remains the most cost-efficient method for real-time assessment. The work demonstrates the potential for scalable, automated tutor training feedback and lays groundwork for future tools such as a lesson recommender system to tailor professional development for tutors.

Abstract

One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies. Novice math tutors often prioritize content-specific guidance, neglecting aspects such as social-emotional learning. Social-emotional learning promotes equity and inclusion and nurturing relationships with students, which is crucial for holistic student development. Assessing the competencies of tutors accurately and efficiently can drive the development of tailored tutor training programs. However, evaluating novice tutor ability during real-time tutoring remains challenging as it typically requires experts-in-the-loop. To address this challenge, this preliminary study aims to harness Generative Pre-trained Transformers (GPT), such as GPT-3.5 and GPT-4 models, to automatically assess tutors' ability of using social-emotional tutoring strategies. Moreover, this study also reports on the financial dimensions and considerations of employing these models in real-time and at scale for automated assessment. The current study examined four prompting strategies: two basic Zero-shot prompt strategies, Tree of Thought prompt, and Retrieval-Augmented Generator (RAG) based prompt. The results indicate that the RAG prompt demonstrated more accurate performance (assessed by the level of hallucination and correctness in the generated assessment texts) and lower financial costs than the other strategies evaluated. These findings inform the development of personalized tutor training interventions to enhance the the educational effectiveness of tutored learning.

Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation

TL;DR

Abstract

Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)