Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Peng Hu; Sizhe Liu; Changjiang Gao; Xin Huang; Xue Han; Junlan Feng; Chao Deng; Shujian Huang

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

TL;DR

This paper investigates why large language models (LLMs) exhibit uneven cross-lingual transfer in reasoning tasks by separating reasoning into knowledge retrieval and knowledge-free components. It introduces a knowledge-free reasoning dataset (KFRD) and adapts existing datasets to vary retrieval demand, using XLTR to measure cross-lingual transfer. The authors find retrieval demand significantly impedes transfer, while knowledge-free reasoning transfers nearly perfectly across languages, supported by interpretability analyses showing higher hidden-state similarity and greater neuron activation overlap for knowledge-free tasks. These results suggest that knowledge is stored language-specifically, whereas reasoning relies on shared neural mechanisms across languages, with practical implications for training data prioritization and multilingual evaluation. Mathematical formulations such as $XLTR(s,t)=\left(\frac{|C_s \cap C_t|}{|C_s|}-A_r\right)/(1-A_r)$ and metrics CS/NAO underpin the cross-lingual transfer and interpretability analyses, respectively.

Abstract

Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning, and analyze the relationship between cross-lingual transferability and these two components. With adapted commonsense reasoning datasets and constructed knowledge-free reasoning datasets, we show that the knowledge-free reasoning capability can be nearly perfectly transferred across various source-target language directions despite the secondary impact of resource in some specific target languages, while cross-lingual knowledge retrieval significantly hinders the transfer. Moreover, by analyzing the hidden states and feed-forward network neuron activation during the reasoning, we show that higher similarity of hidden representations and larger overlap of activated neurons could explain the better cross-lingual transferability of knowledge-free reasoning than knowledge retrieval. Thus, we hypothesize that knowledge-free reasoning shares similar neurons in different languages for reasoning, while knowledge is stored separately in different languages. Our code and data is available at: https://github.com/NJUNLP/Knowledge-Free-Reasoning.

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

TL;DR

and metrics CS/NAO underpin the cross-lingual transfer and interpretability analyses, respectively.

Abstract

Paper Structure (59 sections, 3 equations, 18 figures, 8 tables)

This paper contains 59 sections, 3 equations, 18 figures, 8 tables.

Introduction
Evaluation Methodology
Overview
Impact of Knowledge Retrieval Demand on Cross-Lingual Transfer in Reasoning Tasks
Cross-Lingual Transfer of Knowledge-Free Reasoning
Datasets
Reasoning dataset with variable knowledge retrieval demand
Knowledge-free reasoning dataset
Evaluation metric
Experiment Settings
Language and model choice
Language choice
Model choice
Fine-tuning and decoding settings
Results
...and 44 more sections

Figures (18)

Figure 1: Cross-lingual transfer involves training a model in one language and evaluating it in another. In this figure, the scenario depicts training in English. Reasoning tasks encompass both knowledge retrieval and knowledge-free reasoning. The cross-lingual transfer ratio is significantly lower for knowledge retrieval tasks (e.g., ZH case in EN: "Crocodiles, alligators, and pigeons are dangerous animals") compared to knowledge-free reasoning tasks, which transfer well across languages (e.g., ZH case in EN: "22 plus 23 equals 45").
Figure 2: XLTR of different models on StrategyQA. Solid lines: WF-all results; Dashed lines: NF results. The label of training language (en) is capitalized.
Figure 3: XLTR of LLaMA-2-7B-Chat on StrategyQA under different settings.
Figure 4: XLTR on the different parts of KFRD
Figure 5: XLTR on the existing pseudo knowledge-free reasoning datasets
...and 13 more figures

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

TL;DR

Abstract

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Authors

TL;DR

Abstract

Table of Contents

Figures (18)