Table of Contents
Fetching ...

Enhancing classroom teaching with LLMs and RAG

Elizabeth A Mullins, Adrian Portillo, Kristalys Ruiz-Rohena, Aritran Piplai

TL;DR

This work investigates how RAG pipelines, with the course materials serving as a data source, might help students in K–12 education and concludes that Reddit is not a good source to mine for data for questions about cybersecurity threats.

Abstract

Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.

Enhancing classroom teaching with LLMs and RAG

TL;DR

This work investigates how RAG pipelines, with the course materials serving as a data source, might help students in K–12 education and concludes that Reddit is not a good source to mine for data for questions about cybersecurity threats.

Abstract

Large Language Models have become a valuable source of information for our daily inquiries. However, after training, its data source quickly becomes out-of-date, making RAG a useful tool for providing even more recent or pertinent data. In this work, we investigate how RAG pipelines, with the course materials serving as a data source, might help students in K-12 education. The initial research utilizes Reddit as a data source for up-to-date cybersecurity information. Chunk size is evaluated to determine the optimal amount of context needed to generate accurate answers. After running the experiment for different chunk sizes, answer correctness was evaluated using RAGAs with average answer correctness not exceeding 50 percent for any chunk size. This suggests that Reddit is not a good source to mine for data for questions about cybersecurity threats. The methodology was successful in evaluating the data source, which has implications for its use to evaluate educational resources for effectiveness.

Paper Structure

This paper contains 4 sections, 2 figures.

Figures (2)

  • Figure 1: RAG Pipeline
  • Figure 2: Average answers correctness by chunk sizes.