Table of Contents
Fetching ...

EduNLP: Towards a Unified and Modularized Library for Educational Resources

Zhenya Huang, Yuting Ning, Longhu Qin, Shiwei Tong, Shangzi Xue, Tong Xiao, Xin Lin, Jiayu Liu, Qi Liu, Enhong Chen, Shijing Wang

TL;DR

EduNLP tackles the challenge of educational resource understanding by delivering a unified, modular toolkit that standardizes data via the Standard Item Format (SIF) and decouples the workflow into Data, Preprocess, Model, and Evaluation modules with a Pipeline layer. It provides a Model Hub and a pair of vector interfaces (I2V and T2V) to support easy deployment of domain-specific, education-focused models (e.g., Edu-BERT, Edu-RoBERTa) across eight subjects and five downstream tasks. Through ten implemented models across four categories and large-scale pre-training on multi-subject datasets, EduNLP demonstrates improved performance over baselines and enhances reproducibility and extensibility for education AI research and applications. The framework enables researchers and practitioners to quickly preprocess, train, evaluate, and deploy educational resources, laying groundwork for future capabilities such as educational resource generation.

Abstract

Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridge this gap, we present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. We also provide a configurable pipeline to unify the data usage and model usage in standard ways, where users can customize their own needs. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage. The project is released at: https://github.com/bigdata-ustc/EduNLP.

EduNLP: Towards a Unified and Modularized Library for Educational Resources

TL;DR

EduNLP tackles the challenge of educational resource understanding by delivering a unified, modular toolkit that standardizes data via the Standard Item Format (SIF) and decouples the workflow into Data, Preprocess, Model, and Evaluation modules with a Pipeline layer. It provides a Model Hub and a pair of vector interfaces (I2V and T2V) to support easy deployment of domain-specific, education-focused models (e.g., Edu-BERT, Edu-RoBERTa) across eight subjects and five downstream tasks. Through ten implemented models across four categories and large-scale pre-training on multi-subject datasets, EduNLP demonstrates improved performance over baselines and enhances reproducibility and extensibility for education AI research and applications. The framework enables researchers and practitioners to quickly preprocess, train, evaluate, and deploy educational resources, laying groundwork for future capabilities such as educational resource generation.

Abstract

Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridge this gap, we present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. We also provide a configurable pipeline to unify the data usage and model usage in standard ways, where users can customize their own needs. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage. The project is released at: https://github.com/bigdata-ustc/EduNLP.
Paper Structure (39 sections, 3 figures, 10 tables)

This paper contains 39 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: The overall framework of our library EduNLP.
  • Figure 2: An example of SIF item.
  • Figure 3: An illustrative usage flow of EduNLP.