Table of Contents
Fetching ...

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

Zhengkai Lin, Zhihang Fu, Kai Liu, Liang Xie, Binbin Lin, Wenxiao Wang, Deng Cai, Yue Wu, Jieping Ye

TL;DR

The hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application is proposed and verified, which explains and underscores the importance of the document structure to successful learning.

Abstract

While large language models (LLMs) showcase unprecedented capabilities, they also exhibit certain inherent limitations when facing seemingly trivial tasks. A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A". In this paper, we examine the manifestation of the reversal curse across various tasks and delve into both the generalization abilities and the problem-solving mechanisms of LLMs. This investigation leads to a series of significant insights: (1) LLMs are able to generalize to "B is A" when both A and B are presented in the context as in the case of a multiple-choice question. (2) This generalization ability is highly correlated to the structure of the fact "A is B" in the training documents. For example, this generalization only applies to biographies structured in "[Name] is [Description]" but not to "[Description] is [Name]". (3) We propose and verify the hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application, which explains and underscores the importance of the document structure to successful learning. (4) The negative impact of this bias on the downstream performance of LLMs can hardly be mitigated through training alone. These findings offer a novel perspective on interpreting LLMs' generalization through their intrinsic mechanisms and provide insights for developing more effective learning methods. Our code and data are available at https://github.com/alibaba/thinking_bias.git.

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

TL;DR

The hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application is proposed and verified, which explains and underscores the importance of the document structure to successful learning.

Abstract

While large language models (LLMs) showcase unprecedented capabilities, they also exhibit certain inherent limitations when facing seemingly trivial tasks. A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A". In this paper, we examine the manifestation of the reversal curse across various tasks and delve into both the generalization abilities and the problem-solving mechanisms of LLMs. This investigation leads to a series of significant insights: (1) LLMs are able to generalize to "B is A" when both A and B are presented in the context as in the case of a multiple-choice question. (2) This generalization ability is highly correlated to the structure of the fact "A is B" in the training documents. For example, this generalization only applies to biographies structured in "[Name] is [Description]" but not to "[Description] is [Name]". (3) We propose and verify the hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application, which explains and underscores the importance of the document structure to successful learning. (4) The negative impact of this bias on the downstream performance of LLMs can hardly be mitigated through training alone. These findings offer a novel perspective on interpreting LLMs' generalization through their intrinsic mechanisms and provide insights for developing more effective learning methods. Our code and data are available at https://github.com/alibaba/thinking_bias.git.

Paper Structure

This paper contains 43 sections, 1 equation, 11 figures, 21 tables.

Figures (11)

  • Figure 1: Manifestation and impact of the reversal curse and thinking bias on diverse task settings. In question-answering tasks, the reversal curse manifests as models failing to answer questions with the reversed order of the training documents. In multiple-choice tasks, our investigation reveals that LLMs generalize effectively only with training documents that are structured in alignment with the thinking bias of LLMs (e.g., with name as the subject of the biographical fact).
  • Figure 2: Relative intensities of $S_{nt}$ and $S_{dt}$ across all layers of LLaMA2-7B and 13B models on celebrities dataset. Orange lines denote the relative intensity of the information flow from names. Blue lines denote the relative intensity of the information flow from descriptions.
  • Figure 3: Visualization of the distribution of saliency scores in different tasks on DescriptionIsName subset. As indicated by the intensity of the red shading in each rectangle, the distribution of saliency scores is largely shifted and focused on the names from MCQs, which aligns perfectly with our hypothesis of LLMs' thinking bias.
  • Figure 4: Multiple-choice test accuracies on the DescriptionIsName subset across training. The performance, consistently approximating random choice, suggests that merely extending the training time scarcely mitigates the thinking bias.
  • Figure 5: Results from mix training and QA finetuning mitigation experiments. Both strategies can only help models' performance on in-domain questions, while the near-random choice performance on out-of-domain (OOD) questions underscores the persistence of the thinking bias.
  • ...and 6 more figures