Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures

Yehor Tereshchenko; Mika Hämäläinen; Svitlana Myroniuk

Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures

Yehor Tereshchenko, Mika Hämäläinen, Svitlana Myroniuk

TL;DR

The paper systematically evaluates OpenAI GPT models for translating four endangered Uralic languages from Finnish, comparing reasoning and non-reasoning architectures using refusal-rate analysis on a parallel literary corpus. It finds that reasoning models reduce translation refusals by about 16 percentage points, with Moksha presenting the greatest challenge due to morphological complexity and limited data. The o4-mini-2025-04-16 reasoning model emerges as the best performer, achieving an 8.3% refusal rate, highlighting the potential of reasoning frameworks for low-resource language preservation. The work provides practical guidance for deploying LLM-based translation in endangered-language contexts and suggests future research directions to improve translation quality and evaluation metrics for reasoning systems.

Abstract

The evaluation of Large Language Models (LLMs) for translation tasks has primarily focused on high-resource languages, leaving a significant gap in understanding their performance on low-resource and endangered languages. This study presents a comprehensive comparison of OpenAI's GPT models, specifically examining the differences between reasoning and non-reasoning architectures for translating between Finnish and four low-resource Uralic languages: Komi-Zyrian, Moksha, Erzya, and Udmurt. Using a parallel corpus of literary texts, we evaluate model willingness to attempt translation through refusal rate analysis across different model architectures. Our findings reveal significant performance variations between reasoning and non-reasoning models, with reasoning models showing 16 percentage points lower refusal rates. The results provide valuable insights for researchers and practitioners working with Uralic languages and contribute to the broader understanding of reasoning model capabilities for endangered language preservation.

Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures

TL;DR

Abstract

Evaluating OpenAI GPT Models for Translation of Endangered Uralic Languages: A Comparison of Reasoning and Non-Reasoning Architectures

TL;DR

Abstract

Paper Structure

Table of Contents