Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy
Lin Ni, Sijie Wang, Zeyu Zhang, Xiaoxuan Li, Xianda Zheng, Paul Denny, Jiamou Liu
TL;DR
Predicting student performance on learnersourced MCQs under noisy data and cold-start conditions is tackled with a Signed Bipartite Graph Contrastive Learning (SBCL) framework augmented by LLM-derived semantic embeddings. The method employs graph augmentation and dual GNN encoders to learn edge signs, while the LLM supplies question-level knowledge points joined with structural embeddings for robust predictions. Key contributions include formalizing sign prediction on signed bipartite graphs, introducing inter-/intra-view contrastive learning, and validating semantic augmentation across five PeerWise datasets with leading performance, including high F1 scores. This work enhances robustness and personalization in learnersourcing platforms by leveraging both network structure and semantic content of questions.
Abstract
Learnersourcing offers great potential for scalable education through student content creation. However, predicting student performance on learnersourced questions, which is essential for personalizing the learning experience, is challenging due to the inherent noise in student-generated data. Moreover, while conventional graph-based methods can capture the complex network of student and question interactions, they often fall short under cold start conditions where limited student engagement with questions yields sparse data. To address both challenges, we introduce an innovative strategy that synergizes the potential of integrating Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology employs a signed bipartite graph to comprehensively model student answers, complemented by a contrastive learning framework that enhances noise resilience. Furthermore, LLM's contribution lies in generating foundational question embeddings, proving especially advantageous in addressing cold start scenarios characterized by limited graph data. Validation across five real-world datasets sourced from the PeerWise platform underscores our approach's effectiveness. Our method outperforms baselines, showcasing enhanced predictive accuracy and robustness.
