Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Haoyang Huang; Tianyi Tang; Dongdong Zhang; Wayne Xin Zhao; Ting Song; Yan Xia; Furu Wei

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei

TL;DR

This work introduces Cross-Lingual-Thought Prompting (XLT), a language-independent in-context prompting framework designed to boost multilingual capabilities of large language models by eliciting cross-lingual reasoning through a structured, English-pivot template. XLT decomposes problem solving into roles, input formatting, cross-lingual thinking, task analysis, step-by-step solution, and strict output formatting, with optional few-shot demonstrations to further enhance performance. Through extensive experiments on seven benchmarks across 27 languages and multiple models, XLT delivers significant improvements over basic prompts and chain-of-thought prompts, notably reducing performance gaps between languages and achieving large gains in arithmetic reasoning and open-domain QA. The results also include thorough ablations and demonstrations showing the importance of instruction design, ordering, and rephrasing keywords, and suggest promising directions for broader model evaluation and multilingual prompting research.

Abstract

Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages. In this work, we introduce a simple yet effective method, called cross-lingual-thought prompting (XLT), to systematically improve the multilingual capability of LLMs. Specifically, XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages. We conduct comprehensive evaluations on 7 typical benchmarks related to reasoning, understanding, and generation tasks, covering both high-resource and low-resource languages. Experimental results show that XLT not only remarkably enhances the performance of various multilingual tasks but also significantly reduces the gap between the average performance and the best performance of each task in different languages. Notably, XLT brings over 10 points of average improvement in arithmetic reasoning and open-domain question-answering tasks.

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

TL;DR

Abstract

Paper Structure (36 sections, 1 equation, 24 figures, 13 tables)

This paper contains 36 sections, 1 equation, 24 figures, 13 tables.

Introduction
Cross-Lingual-Thought Prompting
Construction of XLT
Role Assigning
Task Inputting
Cross-lingual Thinking
Task Analyzing
CoT Task Solving
Output Formatting
XLT for Few-shot Learning
Experiments
Experimental Setups
Tasks and Benchmarks
Baselines
Basic Prompt
...and 21 more sections

Figures (24)

Figure 1: Overview of our method. Given a request, its associated meta information is filled into the placeholders of the XLT template to form the language-independent prompt, which is fed to the LLM to enhance the generation of responses in the desired format.
Figure 2: Illustration of XLT template. Referring to Figure \ref{['fig:XLTPExample']} and Appendix for instantiated examples.
Figure 3: Construction process for few-shot learning.
Figure 4: Illustrations of different demonstration input-output pairs in the few-shot learning.
Figure 5: A Chinese example of the MGSM benchmark using basic prompt and the corresponding outputs under the zero-shot setting.
...and 19 more figures

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

TL;DR

Abstract

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Authors

TL;DR

Abstract

Table of Contents

Figures (24)