Adapting AlignScore Mertic for Factual Consistency Evaluation of Text in Russian: A Student Abstract
Mikhail Zimin, Milyausha Shamsutdinova, Georgii Andriushchenko
TL;DR
This work tackles the scarcity of Russian factual-consistency evaluation tools by adapting the English-focused AlignScore into AlignRuScore. It combines translated benchmarks across multiple alignment tasks with native Russian data and trains a RuBERT-based unified alignment function using multi-task learning. The resulting metric delivers strong performance on Russian datasets and provides a means to evaluate Russian-language LLM outputs, supported by released datasets, codes, and model checkpoints. This establishes a practical pathway for robust multilingual factual-consistency evaluation and downstream applications in Russian NLP.
Abstract
Ensuring factual consistency in generated text is crucial for reliable natural language processing applications. However, there is a lack of evaluation tools for factual consistency in Russian texts, as existing tools primarily focus on English corpora. To bridge this gap, we introduce AlignRuScore, a comprehensive adaptation of the AlignScore metric for Russian. To adapt the metric, we fine-tuned a RuBERT-based alignment model with task-specific classification and regression heads on Russian and translated English datasets. Our results demonstrate that a unified alignment metric can be successfully ported to Russian, laying the groundwork for robust multilingual factual consistency evaluation. We release the translated corpora, model checkpoints, and code to support further research.
