Table of Contents
Fetching ...

Adapting AlignScore Mertic for Factual Consistency Evaluation of Text in Russian: A Student Abstract

Mikhail Zimin, Milyausha Shamsutdinova, Georgii Andriushchenko

TL;DR

This work tackles the scarcity of Russian factual-consistency evaluation tools by adapting the English-focused AlignScore into AlignRuScore. It combines translated benchmarks across multiple alignment tasks with native Russian data and trains a RuBERT-based unified alignment function using multi-task learning. The resulting metric delivers strong performance on Russian datasets and provides a means to evaluate Russian-language LLM outputs, supported by released datasets, codes, and model checkpoints. This establishes a practical pathway for robust multilingual factual-consistency evaluation and downstream applications in Russian NLP.

Abstract

Ensuring factual consistency in generated text is crucial for reliable natural language processing applications. However, there is a lack of evaluation tools for factual consistency in Russian texts, as existing tools primarily focus on English corpora. To bridge this gap, we introduce AlignRuScore, a comprehensive adaptation of the AlignScore metric for Russian. To adapt the metric, we fine-tuned a RuBERT-based alignment model with task-specific classification and regression heads on Russian and translated English datasets. Our results demonstrate that a unified alignment metric can be successfully ported to Russian, laying the groundwork for robust multilingual factual consistency evaluation. We release the translated corpora, model checkpoints, and code to support further research.

Adapting AlignScore Mertic for Factual Consistency Evaluation of Text in Russian: A Student Abstract

TL;DR

This work tackles the scarcity of Russian factual-consistency evaluation tools by adapting the English-focused AlignScore into AlignRuScore. It combines translated benchmarks across multiple alignment tasks with native Russian data and trains a RuBERT-based unified alignment function using multi-task learning. The resulting metric delivers strong performance on Russian datasets and provides a means to evaluate Russian-language LLM outputs, supported by released datasets, codes, and model checkpoints. This establishes a practical pathway for robust multilingual factual-consistency evaluation and downstream applications in Russian NLP.

Abstract

Ensuring factual consistency in generated text is crucial for reliable natural language processing applications. However, there is a lack of evaluation tools for factual consistency in Russian texts, as existing tools primarily focus on English corpora. To bridge this gap, we introduce AlignRuScore, a comprehensive adaptation of the AlignScore metric for Russian. To adapt the metric, we fine-tuned a RuBERT-based alignment model with task-specific classification and regression heads on Russian and translated English datasets. Our results demonstrate that a unified alignment metric can be successfully ported to Russian, laying the groundwork for robust multilingual factual consistency evaluation. We release the translated corpora, model checkpoints, and code to support further research.

Paper Structure

This paper contains 18 sections, 5 tables.