Table of Contents
Fetching ...

LangBridge: Multilingual Reasoning Without Multilingual Supervision

Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo

TL;DR

LangBridge presents a zero-shot method to extend reasoning capabilities of language models to multilingual tasks by aligning a multilingual encoder (mT5) with a reasoning LM (e.g., MetaMath or Orca) via a single trainable linear adapter, trained exclusively on English data. The approach relies on the language-agnostic properties of multilingual representations and uses a prefix-language-model objective, keeping the target LM frozen while optionally training the encoder. Across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning, LangBridge yields substantial improvements in low-resource languages and, in several cases, matches or surpasses much larger multilingual baselines. This method reduces the need for multilingual supervision and demonstrates practical impact for multilingual reasoning, with public release of code and models. The findings suggest that language-neutral representations can be effectively transferred to target LMs, offering a scalable path to more inclusive AI systems for underrepresented languages.

Abstract

We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducing minimal trainable parameters between them. Despite utilizing only English data for training, LangBridge considerably enhances the performance of language models on low-resource languages across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations. We publicly release our code and models.

LangBridge: Multilingual Reasoning Without Multilingual Supervision

TL;DR

LangBridge presents a zero-shot method to extend reasoning capabilities of language models to multilingual tasks by aligning a multilingual encoder (mT5) with a reasoning LM (e.g., MetaMath or Orca) via a single trainable linear adapter, trained exclusively on English data. The approach relies on the language-agnostic properties of multilingual representations and uses a prefix-language-model objective, keeping the target LM frozen while optionally training the encoder. Across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning, LangBridge yields substantial improvements in low-resource languages and, in several cases, matches or surpasses much larger multilingual baselines. This method reduces the need for multilingual supervision and demonstrates practical impact for multilingual reasoning, with public release of code and models. The findings suggest that language-neutral representations can be effectively transferred to target LMs, offering a scalable path to more inclusive AI systems for underrepresented languages.

Abstract

We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducing minimal trainable parameters between them. Despite utilizing only English data for training, LangBridge considerably enhances the performance of language models on low-resource languages across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations. We publicly release our code and models.
Paper Structure (52 sections, 1 equation, 9 figures, 15 tables)

This paper contains 52 sections, 1 equation, 9 figures, 15 tables.

Figures (9)

  • Figure 1: MGSM accuracy (%) of MetaMath models and models aligned with mT5-XL encoder (2B) via LangBridge (LB). In addition to the average (avg) accuracy, we also report the average accuracy of high-resource languages (hrl) and underrepresented languages (url) classified by shi2023language.
  • Figure 2: Overview of LangBridge. Left: A multilingual encoder with an added linear layer is aligned with the target language model using English data. We keep the language model frozen, whereas the linear layer is trainable. The multilingual encoder is trainable when adapting pretrained LMs and frozen when adapting finetuned LMs. Right: In test time, a LangBridge model can effectively solve multilingual reasoning tasks.
  • Figure 3: First two principal components of pooled output representations obtained with 300 FLORES samples per language. Note that the scales of the two subplots differ.
  • Figure 4: Example of accidental translation of an Orca 2-LangBridge model prompted with the Snark subset of BBH-BN. Portions of the input prompt and several rational steps in the output are truncated for brevity. Translations are provided in (blue with parenthesis) wherever required.
  • Figure 5: XCOPA accuracy (%) of Orca 2-7B models adapted with LangBridge using five different sizes of mT5 encoder. The dotted line shows the original performance of the target LM.
  • ...and 4 more figures