Table of Contents
Fetching ...

Concept Space Alignment in Multilingual LLMs

Qiwei Peng, Anders Søgaard

TL;DR

The experiments show that multilingual LLMs suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts, and for abstract concepts.

Abstract

Multilingual large language models (LLMs) seem to generalize somewhat across languages. We hypothesize this is a result of implicit vector space alignment. Evaluating such alignment, we see that larger models exhibit very high-quality linear alignments between corresponding concepts in different languages. Our experiments show that multilingual LLMs suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts. For some models, e.g., the Llama-2 family of models, prompt-based embeddings align better than word embeddings, but the projections are less linear -- an observation that holds across almost all model families, indicating that some of the implicitly learned alignments are broken somewhat by prompt-based methods.

Concept Space Alignment in Multilingual LLMs

TL;DR

The experiments show that multilingual LLMs suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts, and for abstract concepts.

Abstract

Multilingual large language models (LLMs) seem to generalize somewhat across languages. We hypothesize this is a result of implicit vector space alignment. Evaluating such alignment, we see that larger models exhibit very high-quality linear alignments between corresponding concepts in different languages. Our experiments show that multilingual LLMs suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts. For some models, e.g., the Llama-2 family of models, prompt-based embeddings align better than word embeddings, but the projections are less linear -- an observation that holds across almost all model families, indicating that some of the implicitly learned alignments are broken somewhat by prompt-based methods.
Paper Structure (16 sections, 1 equation, 3 figures, 24 tables)

This paper contains 16 sections, 1 equation, 3 figures, 24 tables.

Figures (3)

  • Figure 1: Examples of four parallel WordNet concepts, aligned across 7 languages.
  • Figure 2: Performance (P@1) of different LLMs on the concept alignment evaluation when using a seed dictionary of 3,000 concepts. X-axis: Languages, we further divide these languages into three groups, where Group 1 is Indo-European, Group 2 includes languages that are not Indo-European but still in Latin script, while Group 3 refers to languages that are not Indo-European and not in Latin script. Y-axis: We report Precision@1.
  • Figure 3: Performance (P@1) of different LLMs on the concept alignment evaluation when using a seed dictionary of 3000 pairs. X-axis: Languages, we further divide these languages into three groups, where Group 1 is Indo-European, Group 2 includes languages that are not Indo-European but still in Latin script, while Group 3 refers to languages that are not Indo-European and not in Latin script. Y-axis: We report Precision@1.