Evaluating Multimodal Generative AI with Korean Educational Standards

Sanghee Park; Geewook Kim

Evaluating Multimodal Generative AI with Korean Educational Standards

Sanghee Park, Geewook Kim

TL;DR

KoNET addresses the lack of Korean multimodal educational benchmarks by converting four national tests into a multimodal VQA dataset that includes human error data for KoCSAT. The study benchmarks a wide range of open- and closed-source LLMs and MLLMs, employing Chain-of-Thought prompts and OCR, and uses an LLM-as-a-Judge framework to standardize evaluation. Key findings show performance improves with model size but reveals a larger gap for open-source models in Korean contexts, and demonstrate that linguistic and cultural specificity significantly impact AI performance. By releasing an open-source dataset-builder and detailed analyses of human vs AI error patterns, KoNET aims to drive reproducible, language-aware progress in multimodal educational AI and tutoring applications.

Abstract

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

Evaluating Multimodal Generative AI with Korean Educational Standards

TL;DR

Abstract

Evaluating Multimodal Generative AI with Korean Educational Standards

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)