Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang, Yazhou Zhang, Bo Wang, Lu Rong, Prayag Tiwari, Jing Qin
TL;DR
Edu-Values introduces the first benchmark to evaluate LLM alignment with Chinese educational values across seven dimensions, using a 1,418-question dataset that spans MCQ, multi-modal QA, subjective analysis, adversarial prompts, and Chinese culture questions. It employs a hybrid, human-supervised scoring framework to assess model responses, with a total of 1,950 points on a normalized scale. Across 21 state-of-the-art LLMs, Chinese models generally outperform English ones, with Qwen-2-72B leading, yet gaps remain in teachers' ethics and professional philosophy. The study also demonstrates that integrating Edu-Values as an external knowledge repository via retrieval-augmented generation can substantially improve educational value alignment, offering a practical path toward safer and more principled educational AI in Chinese contexts.
Abstract
In this paper, we present Edu-Values, the first Chinese education values evaluation benchmark that includes seven core values: professional philosophy, teachers' professional ethics, education laws and regulations, cultural literacy, educational knowledge and skills, basic competencies and subject knowledge. We meticulously design 1,418 questions, covering multiple-choice, multi-modal question answering, subjective analysis, adversarial prompts, and Chinese traditional culture (short answer) questions. We conduct human feedback based automatic evaluation over 21 state-of-the-art (SoTA) LLMs, and highlight three main findings: (1) due to differences in educational culture, Chinese LLMs outperform English LLMs, with Qwen 2 ranking the first with a score of 81.37; (2) LLMs often struggle with teachers' professional ethics and professional philosophy; (3) leveraging Edu-Values to build an external knowledge repository for RAG significantly improves LLMs' alignment. This demonstrates the effectiveness of the proposed benchmark.
