Table of Contents
Fetching ...

NatLan: Native Language Prompting Facilitates Knowledge Elicitation Through Language Trigger Provision and Domain Trigger Retention

Baixuan Li, Yunlong Fan, Tianyi Ma, Zhiqiang Gao

TL;DR

Native Language Prompting (NatLan) is proposed, employing a Multi-MLLM collaboration strategy and introducing an additional role-enhanced domain-specific MLLM with stronger multilingual understanding capabilities as the translator.

Abstract

Multilingual large language models (MLLMs) do not perform as well when answering questions in non-dominant languages as they do in their dominant languages. Although existing translate-then-answer methods alleviate this issue, the mechanisms behind their effectiveness remain unclear. In this study, we analogize the dominant language of MLLMs to the native language of humans and use two human cognitive features: the Language Trigger (LT) and the Domain Trigger (DT), to interpret the mechanisms behind translate-then-answer methods. This reveals that while sufficient LTs are provided by these methods, there remains a deficiency in DT retention. To mitigate this issue, we propose Native Language Prompting (NatLan), employing a Multi-MLLM collaboration strategy and introducing an additional role-enhanced domain-specific MLLM with stronger multilingual understanding capabilities as the translator. Across five language QA benchmarks, NatLan achieves up to a 31.28% improvement in accuracy and, compared to existing state-of-the-art methods, provides comparable or greater retention of DTs in up to 87% of cases. Our code is available at https://github.com/AnonyNLP/NatLan.

NatLan: Native Language Prompting Facilitates Knowledge Elicitation Through Language Trigger Provision and Domain Trigger Retention

TL;DR

Native Language Prompting (NatLan) is proposed, employing a Multi-MLLM collaboration strategy and introducing an additional role-enhanced domain-specific MLLM with stronger multilingual understanding capabilities as the translator.

Abstract

Multilingual large language models (MLLMs) do not perform as well when answering questions in non-dominant languages as they do in their dominant languages. Although existing translate-then-answer methods alleviate this issue, the mechanisms behind their effectiveness remain unclear. In this study, we analogize the dominant language of MLLMs to the native language of humans and use two human cognitive features: the Language Trigger (LT) and the Domain Trigger (DT), to interpret the mechanisms behind translate-then-answer methods. This reveals that while sufficient LTs are provided by these methods, there remains a deficiency in DT retention. To mitigate this issue, we propose Native Language Prompting (NatLan), employing a Multi-MLLM collaboration strategy and introducing an additional role-enhanced domain-specific MLLM with stronger multilingual understanding capabilities as the translator. Across five language QA benchmarks, NatLan achieves up to a 31.28% improvement in accuracy and, compared to existing state-of-the-art methods, provides comparable or greater retention of DTs in up to 87% of cases. Our code is available at https://github.com/AnonyNLP/NatLan.
Paper Structure (28 sections, 15 figures, 8 tables)

This paper contains 28 sections, 15 figures, 8 tables.

Figures (15)

  • Figure 1: The presence of Language Triggers (LTs) and Domain Triggers (DTs) in questions processed by different methods when addressing non-native language QA. The same icon represents the same question/model.
  • Figure 2: Non-native language question-answering workflow of NatLan. (i) Non-English users issue queries. (ii) The Translator LLM translates the non-native language questions into the native language (English) of the Speaker LLM. (iii) The Speaker LLM answers the native language question. More details are available in Appendix \ref{['appendix:details']}.
  • Figure 3: Pairwise compared DT advantage ratios on the MMMLU benchmark under GPT-4o-mini supervision. The gray bars indicate that the retention of DTs in the questions translated by the two methods is nearly equivalent. More details are available in Appendix \ref{['appendix:details']}.
  • Figure 4: The visualized knowledge activation distributions on the French version of the MMMLU benchmark, with the Speaker LLM: Phi-3-small (7B). The greater the overlap with the green dots (human gold standard), the more accurate the knowledge activation is considered. More cases are available in Appendix \ref{['appendix:act']}.
  • Figure 5: Activation differences between different methods for the same questions. Contents in parentheses indicate the correctness of the Speaker LLMs' responses.
  • ...and 10 more figures