WangchanLion and WangchanX MRC Eval

Wannaphong Phatthiyaphaibun; Surapon Nonesung; Patomporn Payoungkhamdee; Peerat Limkonchotiwat; Can Udomcharoenchaikit; Jitkapat Sawatphol; Chompakorn Chaksangchaichot; Ekapol Chuangsuwanich; Sarana Nutanong

WangchanLion and WangchanX MRC Eval

Wannaphong Phatthiyaphaibun, Surapon Nonesung, Patomporn Payoungkhamdee, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Jitkapat Sawatphol, Chompakorn Chaksangchaichot, Ekapol Chuangsuwanich, Sarana Nutanong

TL;DR

The paper introduces WangchanLion, an open-source Thai instruction-following model built on SEA-LION to advance machine reading comprehension in Thai. It combines English and Thai instruction data with parameter-efficient fine-tuning (QLoRa) to achieve strong MRC performance on Thai benchmarks, while prioritizing data transparency and reproducibility. To address limitations of traditional QA metrics, the authors propose WangchanX MRC Eval, a holistic evaluation framework that includes traditional F1-based extractive QA assessment, human judgments across correctness, helpfulness, conciseness, and contextuality, and LLM-based automated evaluation using GPT-4. Experimental results show WangchanLion's competitive F1 performance against Thai baselines, with human and automated evaluations revealing trade-offs between concise correctness and contextual richness. The work contributes an open benchmark-ready pipeline and a scalable evaluation methodology that can inform future improvements in Thai MRC and open-source LLM evaluation practices.

Abstract

This technical report describes the development of WangchanLion, an instruction fine-tuned model focusing on Machine Reading Comprehension (MRC) in the Thai language. Our model is based on SEA-LION and a collection of instruction following datasets. To promote open research and reproducibility, we publicly release all training data, code, and the final model weights under the Apache-2 license. To assess the contextual understanding capability, we conducted extensive experimental studies using two Thai MRC datasets, XQuAD and Iapp_wiki_qa_squad. Experimental results demonstrate the model's ability to comprehend the context and produce an answer faithful to the reference one in 0-shot and 1-shot settings. In addition, our evaluation goes beyond the traditional MRC. We propose a new evaluation scheme assessing the answer's correctness, helpfulness, conciseness, and contextuality. Our code is available publicly at https://github.com/vistec-AI/WangchanLion.

WangchanLion and WangchanX MRC Eval

TL;DR

Abstract

WangchanLion and WangchanX MRC Eval

Authors

TL;DR

Abstract

Table of Contents

Figures (3)