Table of Contents
Fetching ...

How Language Directions Align with Token Geometry in Multilingual LLMs

JaeSeong Kim, Suan Lee

TL;DR

The paper investigates how language information is encoded in multilingual LLMs and whether pretraining data distributions shape representation geometry. Using a comprehensive probing framework across 268 Transformer layers in six models, it employs linear and nonlinear probes and introduces Token--Language Alignment to quantify layer-wise encoding and vocabulary alignment for five languages. It finds universal, almost fully linear separability of language signals emerging in the first Transformer block, with a small gap between linear and nonlinear probes, and reveals structural imprinting whereby pretraining data distribution strongly shapes language-direction alignment with vocabulary (e.g., English-centric vs Chinese-inclusive models). These results imply that multilingual representations are governed by latent directions formed during pretraining, not surface-script features, with practical implications for data balancing and fairness; the proposed diagnostics provide a toolset for evaluating and guiding multilingual corpus design.

Abstract

Multilingual LLMs demonstrate strong performance across diverse languages, yet there has been limited systematic analysis of how language information is structured within their internal representation space and how it emerges across layers. We conduct a comprehensive probing study on six multilingual LLMs, covering all 268 transformer layers, using linear and nonlinear probes together with a new Token--Language Alignment analysis to quantify the layer-wise dynamics and geometric structure of language encoding. Our results show that language information becomes sharply separated in the first transformer block (+76.4$\pm$8.2 percentage points from Layer 0 to 1) and remains almost fully linearly separable throughout model depth. We further find that the alignment between language directions and vocabulary embeddings is strongly tied to the language composition of the training data. Notably, Chinese-inclusive models achieve a ZH Match@Peak of 16.43\%, whereas English-centric models achieve only 3.90\%, revealing a 4.21$\times$ structural imprinting effect. These findings indicate that multilingual LLMs distinguish languages not by surface script features but by latent representational structures shaped by the training corpus. Our analysis provides practical insights for data composition strategies and fairness in multilingual representation learning. All code and analysis scripts are publicly available at: https://github.com/thisiskorea/How-Language-Directions-Align-with-Token-Geometry-in-Multilingual-LLMs.

How Language Directions Align with Token Geometry in Multilingual LLMs

TL;DR

The paper investigates how language information is encoded in multilingual LLMs and whether pretraining data distributions shape representation geometry. Using a comprehensive probing framework across 268 Transformer layers in six models, it employs linear and nonlinear probes and introduces Token--Language Alignment to quantify layer-wise encoding and vocabulary alignment for five languages. It finds universal, almost fully linear separability of language signals emerging in the first Transformer block, with a small gap between linear and nonlinear probes, and reveals structural imprinting whereby pretraining data distribution strongly shapes language-direction alignment with vocabulary (e.g., English-centric vs Chinese-inclusive models). These results imply that multilingual representations are governed by latent directions formed during pretraining, not surface-script features, with practical implications for data balancing and fairness; the proposed diagnostics provide a toolset for evaluating and guiding multilingual corpus design.

Abstract

Multilingual LLMs demonstrate strong performance across diverse languages, yet there has been limited systematic analysis of how language information is structured within their internal representation space and how it emerges across layers. We conduct a comprehensive probing study on six multilingual LLMs, covering all 268 transformer layers, using linear and nonlinear probes together with a new Token--Language Alignment analysis to quantify the layer-wise dynamics and geometric structure of language encoding. Our results show that language information becomes sharply separated in the first transformer block (+76.48.2 percentage points from Layer 0 to 1) and remains almost fully linearly separable throughout model depth. We further find that the alignment between language directions and vocabulary embeddings is strongly tied to the language composition of the training data. Notably, Chinese-inclusive models achieve a ZH Match@Peak of 16.43\%, whereas English-centric models achieve only 3.90\%, revealing a 4.21 structural imprinting effect. These findings indicate that multilingual LLMs distinguish languages not by surface script features but by latent representational structures shaped by the training corpus. Our analysis provides practical insights for data composition strategies and fairness in multilingual representation learning. All code and analysis scripts are publicly available at: https://github.com/thisiskorea/How-Language-Directions-Align-with-Token-Geometry-in-Multilingual-LLMs.

Paper Structure

This paper contains 12 sections, 4 equations, 2 tables.