Table of Contents
Fetching ...

Reverse Training to Nurse the Reversal Curse

Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

TL;DR

The paper tackles the reversal curse in LLMs, where facts learned in one direction fail to generalize to the reverse. It introduces reverse training, a data-augmentation scheme that doubles training data by including reversed strings and treats the reverse direction as a second language, with four reversal types: token, word, entity-preserving, and random segment reversal. Through symbolic tasks, biographies, real-world knowledge pre-training, and fictitious facts finetuning, the authors show that reverse training—especially entity-preserving and random segment reversal—mitigates the reversal curse and can even improve standard forward performance in data-bound setups. The approach yields strong reversal-task performance (e.g., near 100% on NameToDescription in larger models) without compromising forward capabilities, suggesting a practical path to more robust knowledge generalization in LLMs. Overall, reverse training provides a general, low-overhead strategy to reduce directionality biases in language models across pre-training and fine-tuning contexts.

Abstract

Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby all words are used twice, doubling the amount of available tokens. The LLM is trained in both forward and reverse directions by reversing the training strings while preserving (i.e., not reversing) chosen substrings, such as entities. We show that data-matched reverse-trained models provide superior performance to standard models on standard tasks, and compute-matched reverse-trained models provide far superior performance on reversal tasks, helping resolve the reversal curse issue.

Reverse Training to Nurse the Reversal Curse

TL;DR

The paper tackles the reversal curse in LLMs, where facts learned in one direction fail to generalize to the reverse. It introduces reverse training, a data-augmentation scheme that doubles training data by including reversed strings and treats the reverse direction as a second language, with four reversal types: token, word, entity-preserving, and random segment reversal. Through symbolic tasks, biographies, real-world knowledge pre-training, and fictitious facts finetuning, the authors show that reverse training—especially entity-preserving and random segment reversal—mitigates the reversal curse and can even improve standard forward performance in data-bound setups. The approach yields strong reversal-task performance (e.g., near 100% on NameToDescription in larger models) without compromising forward capabilities, suggesting a practical path to more robust knowledge generalization in LLMs. Overall, reverse training provides a general, low-overhead strategy to reduce directionality biases in language models across pre-training and fine-tuning contexts.

Abstract

Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby all words are used twice, doubling the amount of available tokens. The LLM is trained in both forward and reverse directions by reversing the training strings while preserving (i.e., not reversing) chosen substrings, such as entities. We show that data-matched reverse-trained models provide superior performance to standard models on standard tasks, and compute-matched reverse-trained models provide far superior performance on reversal tasks, helping resolve the reversal curse issue.
Paper Structure (19 sections, 3 equations, 2 figures, 9 tables)

This paper contains 19 sections, 3 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Training loss for 1.4B models in the pre-training stage. On the $x$-axis we display the total number of tokens model has been trained on, including both in standard and reverse direction.
  • Figure 2: Evaluation results during training on the real-world celebrity task when using different pre-training methods for LLMs.