Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs

Kartik Ravisankar; Hyojung Han; Marine Carpuat

Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs

Kartik Ravisankar, Hyojung Han, Marine Carpuat

TL;DR

The paper investigates cross-lingual representation alignment in decoder-only LLMs and its relation to multilingual performance. It introduces per-sample metrics—Discriminative Alignment Index (DALI, including DALI_S) and a task-specific MEXA_T—and evaluates them on Belebele, XStorycloze, XCOPA, and FLORES-based translation tasks, using Llama3.1 8B. Findings show strong language-level correlations between alignment to English and task accuracy, but instance-level signals are only predictive in some tasks (notably Belebele) and for translation in several directions, revealing alignment as a necessary but not sufficient condition for success. The work highlights asymmetries in En→XX versus XX→En translation and points to confounding factors and the limits of English-centric alignment, suggesting more nuanced analyses of confidence and calibration are needed for robust multilingual reasoning.

Abstract

Large language models (LLMs) pre-trained predominantly on English text exhibit surprising multilingual capabilities, yet the mechanisms driving cross-lingual generalization remain poorly understood. This work investigates how the alignment of representations for text written in different languages correlates with LLM performance on natural language understanding tasks and translation tasks, both at the language and the instance level. For this purpose, we introduce cross-lingual alignment metrics such as the Discriminative Alignment Index (DALI) to quantify the alignment at an instance level for discriminative tasks. Through experiments on three natural language understanding tasks (Belebele, XStoryCloze, XCOPA), and machine translation, we find that while cross-lingual alignment metrics strongly correlate with task accuracy at the language level, the sample-level alignment often fails to distinguish correct from incorrect predictions, exposing alignment as a necessary but insufficient condition for success.

Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs

TL;DR

Abstract

Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)