Table of Contents
Fetching ...

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Mengyu Bu, Yang Feng

Abstract

Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to reliably interface this knowledge with low-resource or unseen languages. Fortunately, pretrained encoder-decoder translation models already possess balanced multilingual capability, suggesting a natural complement to LLMs. In this work, we propose XBridge, a compositional encoder-LLM-decoder architecture that offloads multilingual understanding and generation to external pretrained translation models, while preserving the LLM as an English-centric core for general knowledge processing. To address the resulting representation misalignment across models, we introduce lightweight cross-model mapping layers and an optimal transport-based alignment objective, enabling fine-grained semantic consistency for multilingual generation. Experiments on four LLMs across multilingual understanding, reasoning, summarization, and generation indicate that XBridge outperforms strong baselines, especially on low-resource and previously unseen languages, without retraining the LLM.

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Abstract

Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to reliably interface this knowledge with low-resource or unseen languages. Fortunately, pretrained encoder-decoder translation models already possess balanced multilingual capability, suggesting a natural complement to LLMs. In this work, we propose XBridge, a compositional encoder-LLM-decoder architecture that offloads multilingual understanding and generation to external pretrained translation models, while preserving the LLM as an English-centric core for general knowledge processing. To address the resulting representation misalignment across models, we introduce lightweight cross-model mapping layers and an optimal transport-based alignment objective, enabling fine-grained semantic consistency for multilingual generation. Experiments on four LLMs across multilingual understanding, reasoning, summarization, and generation indicate that XBridge outperforms strong baselines, especially on low-resource and previously unseen languages, without retraining the LLM.
Paper Structure (50 sections, 9 equations, 12 figures, 17 tables)

This paper contains 50 sections, 9 equations, 12 figures, 17 tables.

Figures (12)

  • Figure 1: Overview of XBridge. Pretrained multilingual NMT models provide broad language coverage but limited general reasoning capability, while English-centric LLMs excel at general reasoning yet struggle with low-resource or unseen languages. XBridge harmonizes these strengths through model composition, offloading multilingual processing to the pretrained multilingual model while leveraging the LLM as a knowledge core.
  • Figure 2: Left: XBridge composes a pretrained multilingual encoder-decoder with an LLM via lightweight mapping layers for multilingual understanding and generation, keeping the LLM frozen as a knowledge core. Right: A three-stage training strategy progressively aligns heterogeneous representations and adapts the encoder and decoder.
  • Figure 3: Multilingual reasoning accuracy on MGSM and multilingual summarization Rouge-L on XL-Sum, with complete results in Appendix \ref{['detailed_results']}. Models with the same base LLM share the same color scheme, where lighter shades denote baselines and darker shades denote XBridge. "XBridge-LLM" refers to English reasoning by the LLM, while "XBridge-Dec" refers to multilingual reasoning by the composed decoder. For XL-Sum, since the baselines produce English-only summaries, we translate them into target languages using NLLB-200-1.3B for evaluation.
  • Figure 4: Ablation analysis of XBridge. We compare different ablated variants of XBridge: encoder-only augmentation "w/o Decoder", loss ablation "w/o OT", removal of stage 1 "w/o Stage 1", and joint optimization of stage 2&3 "Joint Stage 2&3". “Lrl”, “Hrl”, and “Avg” denote low-, high-resource, and average performance, respectively.
  • Figure 5: Cross-lingual generalization to 42 untuned languages in FLORES-101. Left: X$\rightarrow$En direction. Right: En$\rightarrow$X direction. We directly evaluate the ablation variants described in Section \ref{['ablation_text']}. Appendix \ref{['detailed_results']} lists the included untuned languages and provides detailed results.
  • ...and 7 more figures