1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

Yue Huang; Chenrui Fan; Yuan Li; Siyuan Wu; Tianyi Zhou; Xiangliang Zhang; Lichao Sun

1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

Yue Huang, Chenrui Fan, Yuan Li, Siyuan Wu, Tianyi Zhou, Xiangliang Zhang, Lichao Sun

TL;DR

A method to enhance the multilingual performance of LLMs by aggregating knowledge from diverse languages by incorporating a low-resource knowledge detector specific to a language, a strategic language selection process, and mechanisms for answer replacement and integration are introduced.

Abstract

Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating knowledge from diverse languages. This approach incorporates a low-resource knowledge detector specific to a language, a language selection process, and mechanisms for answer replacement and integration. Our experiments demonstrate notable performance improvements, particularly in reducing language performance disparity. An ablation study confirms that each component of our method significantly contributes to these enhancements. This research highlights the inherent potential of LLMs to harmonize multilingual capabilities and offers valuable insights for further exploration.

1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

TL;DR

Abstract

Paper Structure (24 sections, 4 equations, 16 figures, 9 tables, 1 algorithm)

This paper contains 24 sections, 4 equations, 16 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Multilingual LLMs
Factuality in LLMs
Hallucination Mitigation
Methodology
Motivation
Construction of Low-Resource Dataset
Low-Resource Knowledge Detector
Target Language Selection
Answer Replacement & Integration
Experiments
Experiment Setup
Main Results
Ablation Study
...and 9 more sections

Figures (16)

Figure 1: The top is an example of distinct answers to the same questions in different languages. The bottom is the GPT-4's performance on 300 queries in HalluEval li2023halueval of nine different languages.
Figure 2: The knowledge domain of a multilingual LLM can be separated into multiple sections (the figure shows two). The language-specific knowledge (pure blue or pure orange) in one language can be utilized for improving the performance in other languages.
Figure 3: The average performance of six LLMs in five datasets. We show the accuracy of Chinese and English domain knowledge with the query/answer in Chinese and English.
Figure 4: The proposed method begins with the query detection of low-resource knowledge powered by a detector. If low-resource knowledge is detected within the queries, LLMs then select the language most likely to yield the best answer. Answer replacement and integration are employed to formulate the final response.
Figure 5: Statistics of the dataset in our experiments.
...and 11 more figures

1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

TL;DR

Abstract

1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

Authors

TL;DR

Abstract

Table of Contents

Figures (16)