Table of Contents
Fetching ...

Understanding Cross-Lingual Alignment -- A Survey

Katharina Hämmerl, Jindřich Libovický, Alexander Fraser

TL;DR

Cross-lingual alignment aims to align multilingual representations to enable zero-shot transfer across languages. The paper presents two complementary views: a space-centric view ( View I ) and a task-centric view ( View II ), and offers a taxonomy of alignment strategies—from parallel-data objectives to contrastive learning, data augmentation, and representation transformations. It highlights key findings, notably the effectiveness of contrastive training, the non-necessity of pre-training alone for strong alignment, and the persistent advantage of parallel data when available, while noting that related languages align better and that strong global alignment may trade off language-specific information. The review then extends these insights to multilingual generative models, discussing where and how to align such models and the challenges of evaluating multilingual generation, ultimately calling for methods that balance semantic alignment with language-specific signals to support fluent, multilingual generation. Together, the work provides practical guidance for researchers and engineers designing cross-lingual reporting, retrieval, and generation systems across diverse languages.

Abstract

Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a large number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.

Understanding Cross-Lingual Alignment -- A Survey

TL;DR

Cross-lingual alignment aims to align multilingual representations to enable zero-shot transfer across languages. The paper presents two complementary views: a space-centric view ( View I ) and a task-centric view ( View II ), and offers a taxonomy of alignment strategies—from parallel-data objectives to contrastive learning, data augmentation, and representation transformations. It highlights key findings, notably the effectiveness of contrastive training, the non-necessity of pre-training alone for strong alignment, and the persistent advantage of parallel data when available, while noting that related languages align better and that strong global alignment may trade off language-specific information. The review then extends these insights to multilingual generative models, discussing where and how to align such models and the challenges of evaluating multilingual generation, ultimately calling for methods that balance semantic alignment with language-specific signals to support fluent, multilingual generation. Together, the work provides practical guidance for researchers and engineers designing cross-lingual reporting, retrieval, and generation systems across diverse languages.

Abstract

Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a large number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.
Paper Structure (48 sections, 4 equations, 5 tables)