Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh
TL;DR
A simple pipeline for multilingual factuality evaluation is introduced, by applying FActScore (Min et al., 2023) for diverse languages, and comprehensive guidelines on multilingual factual evaluation for regionally diverse topics are provided.
Abstract
Evaluating the factuality of long-form large language model (LLM)-generated text is an important challenge. Recently there has been a surge of interest in factuality evaluation for English, but little is known about the factuality evaluation of multilingual LLMs, specially when it comes to long-form generation. %This paper systematically evaluates multilingual LLMs' factual accuracy across languages and geographic regions. We introduce a simple pipeline for multilingual factuality evaluation, by applying FActScore (Min et al., 2023) for diverse languages. In addition to evaluating multilingual factual generation, we evaluate the factual accuracy of long-form text generation in topics that reflect regional diversity. We also examine the feasibility of running the FActScore pipeline using non-English Wikipedia and provide comprehensive guidelines on multilingual factual evaluation for regionally diverse topics.
