Table of Contents
Fetching ...

An Analysis and Mitigation of the Reversal Curse

Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan

TL;DR

The paper investigates the reversal curse in LLMs, where models correctly infer a from a but fail to infer a from b via the inverse relation. It attributes at least part of this phenomenon to the next-token prediction objective and demonstrates that ABI-like training can mitigate the effect. The authors introduce BICO, a fine-tuning framework that enables bidirectional attention for causal LMs and combines a masked denoising objective with controlled NTP updates to preserve generation. Across synthetic name–description data, GSM8k-style backward math, and translation tasks, BICO substantially reduces reversal errors (up to ~70% reverse-task accuracy) while maintaining forward performance, highlighting the impact of training objectives on reasoning capabilities and offering a practical mitigation strategy.

Abstract

Recent research observed a noteworthy phenomenon in large language models (LLMs), referred to as the ``reversal curse.'' The reversal curse is that when dealing with two entities, denoted as $a$ and $b$, connected by their relation $R$ and its inverse $R^{-1}$, LLMs excel in handling sequences in the form of ``$aRb$,'' but encounter challenges when processing ``$bR^{-1}a$,'' whether in generation or comprehension. For instance, GPT-4 can accurately respond to the query ``Tom Cruise's mother is?'' with ``Mary Lee Pfeiffer,'' but it struggles to provide a satisfactory answer when asked ``Mary Lee Pfeiffer's son is?'' In this paper, we undertake the first-ever study of how the reversal curse happens in LLMs. Our investigations reveal that the reversal curse can stem from the specific training objectives, which become particularly evident in the widespread use of next-token prediction within most causal language models. We hope this initial investigation can draw more attention to the reversal curse, as well as other underlying limitations in current LLMs.

An Analysis and Mitigation of the Reversal Curse

TL;DR

The paper investigates the reversal curse in LLMs, where models correctly infer a from a but fail to infer a from b via the inverse relation. It attributes at least part of this phenomenon to the next-token prediction objective and demonstrates that ABI-like training can mitigate the effect. The authors introduce BICO, a fine-tuning framework that enables bidirectional attention for causal LMs and combines a masked denoising objective with controlled NTP updates to preserve generation. Across synthetic name–description data, GSM8k-style backward math, and translation tasks, BICO substantially reduces reversal errors (up to ~70% reverse-task accuracy) while maintaining forward performance, highlighting the impact of training objectives on reasoning capabilities and offering a practical mitigation strategy.

Abstract

Recent research observed a noteworthy phenomenon in large language models (LLMs), referred to as the ``reversal curse.'' The reversal curse is that when dealing with two entities, denoted as and , connected by their relation and its inverse , LLMs excel in handling sequences in the form of ``,'' but encounter challenges when processing ``,'' whether in generation or comprehension. For instance, GPT-4 can accurately respond to the query ``Tom Cruise's mother is?'' with ``Mary Lee Pfeiffer,'' but it struggles to provide a satisfactory answer when asked ``Mary Lee Pfeiffer's son is?'' In this paper, we undertake the first-ever study of how the reversal curse happens in LLMs. Our investigations reveal that the reversal curse can stem from the specific training objectives, which become particularly evident in the widespread use of next-token prediction within most causal language models. We hope this initial investigation can draw more attention to the reversal curse, as well as other underlying limitations in current LLMs.
Paper Structure (24 sections, 8 equations, 8 figures, 5 tables)

This paper contains 24 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Different training objectives of language models. Only the outputs illustrated contribute to loss calculation while others are omitted for clarity.
  • Figure 2: Data employed for studying the reversal curse on relation $R_{N2D}$. All names and descriptions are fictitious. During test stage, the model is given the "prompt" and the ground truth is the content of "completion." For example, in the $N2D$ task, the model is given the same name as those encountered during fine-tuning but is presented with paraphrased prompts. In the $\mathop{N2D}\limits^{\longleftarrow}$ task, the model is tasked with generating the corresponding names based on descriptions seen during fine-tuning.
  • Figure 3: (a) Training details in BICO. BICO modifies the causal attention into a bidirectional one. Attention calculations are partitioned into two parts based on the relative positions of query and key vectors. Numbers in squares denote the relative distance between $q_{m}$ and $k_{n}$. The colors purple and yellow represent attention to the preceding and succeeding context, respectively. Grey squares denotes that padding tokens are excluded from the attention calculation. (b) During inference, the language model adopts the causal attention as usual and predicts tokens autoregressively. For clarity, we only illustrate a single transformer layer and omit irrelevant modules.
  • Figure 4: A test sample from the original GSM8k dataset cobbe2021gsm8k, alongside its "reversal" counterpart crafted by yu2023metamath. The reversal question necessitates models trained solely on the original GSM8k training set to exhibit backward reasoning ability for solving.
  • Figure 5: The probability of the desired completion given prompts provided by various models in $\mathop{D2N}\limits^{\longleftarrow}$ task. This probability is evaluated across the entire test set and is presented as an average. It is clear that BICO enhances the likelihood of achieving ground truth prediction.
  • ...and 3 more figures