Table of Contents
Fetching ...

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

Ruochen Zhang, Qinan Yu, Matianyu Zang, Carsten Eickhoff, Ellie Pavlick

TL;DR

The paper investigates whether large language models encode linguistic structure in shared internal circuits across languages. By applying mechanistic interpretability methods (path patching and information flow routes) to IOI and past-tense tasks in English and Chinese across multilingual and monolingual models, it finds that common syntactic processes are handled by largely shared circuits, while language-specific morphology engages specialized components such as FFNs in English. This reveals a trade-off where models leverage universal processing patterns yet preserve language-specific differences, informing cross-lingual transfer and multilingual model design. The work provides a principled framework for dissecting multilingual internal representations and points to future research on exploiting circuit overlap to improve multilingual robustness and safety.

Abstract

We employ new tools from mechanistic interpretability in order to ask whether the internal structure of large language models (LLMs) shows correspondence to the linguistic structures which underlie the languages on which they are trained. In particular, we ask (1) when two languages employ the same morphosyntactic processes, do LLMs handle them using shared internal circuitry? and (2) when two languages require different morphosyntactic processes, do LLMs handle them using different internal circuitry? Using English and Chinese multilingual and monolingual models, we analyze the internal circuitry involved in two tasks. We find evidence that models employ the same circuit to handle the same syntactic process independently of the language in which it occurs, and that this is the case even for monolingual models trained completely independently. Moreover, we show that multilingual models employ language-specific components (attention heads and feed-forward networks) when needed to handle linguistic processes (e.g., morphological marking) that only exist in some languages. Together, our results provide new insights into how LLMs trade off between exploiting common structures and preserving linguistic differences when tasked with modeling multiple languages simultaneously.

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

TL;DR

The paper investigates whether large language models encode linguistic structure in shared internal circuits across languages. By applying mechanistic interpretability methods (path patching and information flow routes) to IOI and past-tense tasks in English and Chinese across multilingual and monolingual models, it finds that common syntactic processes are handled by largely shared circuits, while language-specific morphology engages specialized components such as FFNs in English. This reveals a trade-off where models leverage universal processing patterns yet preserve language-specific differences, informing cross-lingual transfer and multilingual model design. The work provides a principled framework for dissecting multilingual internal representations and points to future research on exploiting circuit overlap to improve multilingual robustness and safety.

Abstract

We employ new tools from mechanistic interpretability in order to ask whether the internal structure of large language models (LLMs) shows correspondence to the linguistic structures which underlie the languages on which they are trained. In particular, we ask (1) when two languages employ the same morphosyntactic processes, do LLMs handle them using shared internal circuitry? and (2) when two languages require different morphosyntactic processes, do LLMs handle them using different internal circuitry? Using English and Chinese multilingual and monolingual models, we analyze the internal circuitry involved in two tasks. We find evidence that models employ the same circuit to handle the same syntactic process independently of the language in which it occurs, and that this is the case even for monolingual models trained completely independently. Moreover, we show that multilingual models employ language-specific components (attention heads and feed-forward networks) when needed to handle linguistic processes (e.g., morphological marking) that only exist in some languages. Together, our results provide new insights into how LLMs trade off between exploiting common structures and preserving linguistic differences when tasked with modeling multiple languages simultaneously.

Paper Structure

This paper contains 28 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Attention heads activation frequency comparison between English IOI, Chinese IOI and English control tasks. Left: Comparison between English IOI and Chinese IOI on BLOOM (with specific functional heads in different marker types). Right: Comparison between English IOI and English Tense on BLOOM. For pairs of heads with non-zero activation frequency, they are shaded based on their value difference. Darker gray means smaller differences. The left graph has more number of shaded head pairs compared to the right, indicating a greater similarity between the activated heads.
  • Figure 2: Illustration of IOI circuits of English and Chinese on BLOOM-560M model. Blue-framed rectangles or texts are the components used in English IOI and orange-framed ones are for Chinese IOI. Heads marked in black are the shared functional heads between English and Chinese IOI. The English and Chinese IOI circuits are highly similar as they use most of the same components for the same functionality to implement the algorithms.
  • Figure 3: English and Chinese IOI task circuits. Blue highlights are components used and inputs for the English task in GPT2. Orange highlights are for CPM. The components with both blue and orange frames are shared in both the English and Chinese circuits. In each of these heads, blue texts indicate the heads in the English Circuit and orange in Chinese. The Negative Name Mover heads appear only in the English model. Copy Suppression Head appears only in the Chinese model. Despite trained on completely different data, models still implement largely similar algorithms to solve tasks in different languages.
  • Figure 4: (a) Head activation frequency comparison between English and Chinese past tense task on Qwen. (b) Illustration of the tense circuits in Qwen that contains three components we focus on: copy heads, past tense heads and feed-forward layers. (c) Top promoted tokens from copy head and past tense head. Words highlighted in either blue and orange are expected model predictions. Gray tokens are the specific tokens that the head promotes. Here the copy head is 21.3 and the past tense head is 19.4. As shown in (a), the copy head has a similarly high activation frequency in both languages whereas the past tense head is only frequently activated in English.
  • Figure 5: Illustration of the effects of late feed-forward layers ablation (layer 20-23). Left: Zero-rank rate changes comparison between English and Chinese past tense task. Right: Top predicted tokens comparison for before and after ablation. We observe that English performance is significantly impacted after ablation but Chinese is barely influenced. Qualitative results on the right show that the correction token move backward in English but remain the top prediction in Chinese.
  • ...and 6 more figures