Spivavtor: An Instruction Tuned Ukrainian Text Editing Model
Aman Saini, Artem Chernodub, Vipul Raheja, Vivek Kulkarni
TL;DR
Spivavtor addresses the need for high-quality Ukrainian text editing with instruction-tuned LLMs by creating Spivavtor-Instruct, a Ukrainian adaptation of CoEdIT. It unifies four editing tasks (GEC, Simplification, Coherence, Paraphrasing) through Ukrainian instructions derived from translated English datasets and expert verbalizers, and evaluates a diverse set of Encoder-Decoder and Decoder-only architectures across model scales. The results show domain-specific instruction tuning yields substantial gains over generic instruction tuning, with Encoder-Decoder models and larger scales delivering the strongest performance, outperforming strong baselines including GPT-4 on most tasks. The work provides a valuable Ukrainian resource bundle and demonstrates practical guidance on model choice, training, and evaluation for Ukrainian text editing in real-world NLP pipelines.
Abstract
We introduce Spivavtor, a dataset, and instruction-tuned models for text editing focused on the Ukrainian language. Spivavtor is the Ukrainian-focused adaptation of the English-only CoEdIT model. Similar to CoEdIT, Spivavtor performs text editing tasks by following instructions in Ukrainian. This paper describes the details of the Spivavtor-Instruct dataset and Spivavtor models. We evaluate Spivavtor on a variety of text editing tasks in Ukrainian, such as Grammatical Error Correction (GEC), Text Simplification, Coherence, and Paraphrasing, and demonstrate its superior performance on all of them. We publicly release our best-performing models and data as resources to the community to advance further research in this space.
