Table of Contents
Fetching ...

A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

Yuanhao Liu, Shuo Liu, Yimeng Liu, Jingwen Yang, Hong Qian

TL;DR

To standardize the original text corpus and make it easier for CDMs to capture relevant textual semantic information, this paper first proposes the exercise-refiner and concept-refiner to make the exercises and knowledge concepts more coherent and reasonable in educational scenario via large language models.

Abstract

Cognitive diagnosis model (CDM) is a fundamental and upstream component in intelligent education. It aims to infer students' mastery levels based on historical response logs. However, existing CDMs usually follow the ID-based embedding paradigm, which could often diminish the effectiveness of CDMs in open student learning environments. This is mainly because they can hardly directly infer new students' mastery levels or utilize new exercises or knowledge without retraining. Textual semantic information, due to its unified feature space and easy accessibility, can help alleviate this issue. Unfortunately, directly incorporating semantic information may not benefit CDMs, since it does not capture response-relevant features and thus discards the individual characteristics of each student. To this end, this paper proposes a dual-fusion cognitive diagnosis framework (DFCD) to address the challenge of aligning two different modalities, i.e., textual semantic features and response-relevant features. Specifically, in DFCD, we first propose the exercise-refiner and concept-refiner to make the exercises and knowledge concepts more coherent and reasonable via large language models. Then, DFCD encodes the refined features using text embedding models to obtain the semantic information. For response-related features, we propose a novel response matrix to fully incorporate the information within the response logs. Finally, DFCD designs a dual-fusion module to merge the two modal features. The ultimate representations possess the capability of inference in open student learning environments and can be also plugged in existing CDMs. Extensive experiments across real-world datasets show that DFCD achieves superior performance by integrating different modalities and strong adaptability in open student learning environments.

A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

TL;DR

To standardize the original text corpus and make it easier for CDMs to capture relevant textual semantic information, this paper first proposes the exercise-refiner and concept-refiner to make the exercises and knowledge concepts more coherent and reasonable in educational scenario via large language models.

Abstract

Cognitive diagnosis model (CDM) is a fundamental and upstream component in intelligent education. It aims to infer students' mastery levels based on historical response logs. However, existing CDMs usually follow the ID-based embedding paradigm, which could often diminish the effectiveness of CDMs in open student learning environments. This is mainly because they can hardly directly infer new students' mastery levels or utilize new exercises or knowledge without retraining. Textual semantic information, due to its unified feature space and easy accessibility, can help alleviate this issue. Unfortunately, directly incorporating semantic information may not benefit CDMs, since it does not capture response-relevant features and thus discards the individual characteristics of each student. To this end, this paper proposes a dual-fusion cognitive diagnosis framework (DFCD) to address the challenge of aligning two different modalities, i.e., textual semantic features and response-relevant features. Specifically, in DFCD, we first propose the exercise-refiner and concept-refiner to make the exercises and knowledge concepts more coherent and reasonable via large language models. Then, DFCD encodes the refined features using text embedding models to obtain the semantic information. For response-related features, we propose a novel response matrix to fully incorporate the information within the response logs. Finally, DFCD designs a dual-fusion module to merge the two modal features. The ultimate representations possess the capability of inference in open student learning environments and can be also plugged in existing CDMs. Extensive experiments across real-world datasets show that DFCD achieves superior performance by integrating different modalities and strong adaptability in open student learning environments.

Paper Structure

This paper contains 35 sections, 11 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: The left subfigure denotes the process of CD. The middle subfigure shows the results of the motivation study on MOOC-Radar dataset. The right subfigure shows the t-SNE visualization of exercise text via text-embedding-ada-002 from the NeurIPS2020 dataset, with each exercise point colored according to its corresponding concept. Notably, we select the subfigures of certain datasets for brevity. Similar results for other datasets are presented in the Appendix \ref{['appd:mot']}.
  • Figure 2: The overall framework of DFCD. (a) Textual feature constructor. Examples in it are all from real data. (b) Response feature constructor. (c) Detailed components of DFCD.
  • Figure 3: Comparison of DFCD with different integrated CDMs. US means the scenario of unseen student, UE means the scenario of unseen exercise, and UC means the scenario of unseen concept.
  • Figure 4: Directly utilized text embedding may not benfit CDMs.
  • Figure 5: The visualization of exercise text features.
  • ...and 9 more figures