Table of Contents
Fetching ...

Leveraging Prompts in LLMs to Overcome Imbalances in Complex Educational Text Data

Jeanne McClure, Machi Shimmei, Noboru Matsuda, Shiyan Jiang

TL;DR

This work tackles the challenge of imbalanced and small educational datasets for classifying cognitive engagement (CE). It applies Large Language Models augmented with Assertion Enhanced Few-Shot Learning (AEFL) via an Iterative In-context Learning Prompt Engineering design, combining General CoT, reasoning-based Few-Shot prompts, and targeted assertions. Results show that LLMs with prompts outpace traditional ML methods, especially for minority CE classes (e.g., Constructive), achieving up to a $32\%$ relative improvement in $F_1$-score and an $11.94\%$ accuracy boost in the sensitivity subset, while addressing lexical ambiguity and contextual understanding limitations. The findings indicate substantial potential for LLM-driven CE analysis in education and highlight the need to broaden evaluation contexts and further refine AEFL strategies.

Abstract

In this paper, we explore the potential of Large Language Models (LLMs) with assertions to mitigate imbalances in educational datasets. Traditional models often fall short in such contexts, particularly due to the complexity and nuanced nature of the data. This issue is especially prominent in the education sector, where cognitive engagement levels among students show significant variation in their open responses. To test our hypothesis, we utilized an existing technology for assertion-based prompt engineering through an 'Iterative - ICL PE Design Process' comparing traditional Machine Learning (ML) models against LLMs augmented with assertions (N=135). Further, we conduct a sensitivity analysis on a subset (n=27), examining the variance in model performance concerning classification metrics and cognitive engagement levels in each iteration. Our findings reveal that LLMs with assertions significantly outperform traditional ML models, particularly in cognitive engagement levels with minority representation, registering up to a 32% increase in F1-score. Additionally, our sensitivity study indicates that incorporating targeted assertions into the LLM tested on the subset enhances its performance by 11.94%. This improvement primarily addresses errors stemming from the model's limitations in understanding context and resolving lexical ambiguities in student responses.

Leveraging Prompts in LLMs to Overcome Imbalances in Complex Educational Text Data

TL;DR

This work tackles the challenge of imbalanced and small educational datasets for classifying cognitive engagement (CE). It applies Large Language Models augmented with Assertion Enhanced Few-Shot Learning (AEFL) via an Iterative In-context Learning Prompt Engineering design, combining General CoT, reasoning-based Few-Shot prompts, and targeted assertions. Results show that LLMs with prompts outpace traditional ML methods, especially for minority CE classes (e.g., Constructive), achieving up to a relative improvement in -score and an accuracy boost in the sensitivity subset, while addressing lexical ambiguity and contextual understanding limitations. The findings indicate substantial potential for LLM-driven CE analysis in education and highlight the need to broaden evaluation contexts and further refine AEFL strategies.

Abstract

In this paper, we explore the potential of Large Language Models (LLMs) with assertions to mitigate imbalances in educational datasets. Traditional models often fall short in such contexts, particularly due to the complexity and nuanced nature of the data. This issue is especially prominent in the education sector, where cognitive engagement levels among students show significant variation in their open responses. To test our hypothesis, we utilized an existing technology for assertion-based prompt engineering through an 'Iterative - ICL PE Design Process' comparing traditional Machine Learning (ML) models against LLMs augmented with assertions (N=135). Further, we conduct a sensitivity analysis on a subset (n=27), examining the variance in model performance concerning classification metrics and cognitive engagement levels in each iteration. Our findings reveal that LLMs with assertions significantly outperform traditional ML models, particularly in cognitive engagement levels with minority representation, registering up to a 32% increase in F1-score. Additionally, our sensitivity study indicates that incorporating targeted assertions into the LLM tested on the subset enhances its performance by 11.94%. This improvement primarily addresses errors stemming from the model's limitations in understanding context and resolving lexical ambiguities in student responses.
Paper Structure (16 sections, 1 equation, 5 figures, 4 tables)

This paper contains 16 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: ICL Prompt Engineering Design Process to optimize the accuracy of LLMs in classifying educational data with the use of ICL, COT and AEFL.
  • Figure 2: Performance Metrics Summary by Cognitive Engagement Class showing results for each cognitive class.
  • Figure 3: Relative Performance Heatmap by Cognitive Engagement Class
  • Figure 4: Left image does not include a targeted assertion while the one on the right does and improves the model output to correctly predict the students cognitive level of their text.
  • Figure 5: Percentage Change in Metrics for Each Class Across Experiments