Table of Contents
Fetching ...

Examining Two Hop Reasoning Through Information Content Scaling

David Johnston, Nora Belrose

TL;DR

The paper addresses why latent two-hop reasoning in transformers is hard by introducing information content scaling as a quantitative interpretability tool. It builds synthetic two-hop QA datasets, trains Transformer models across sizes with muP, and measures capacity via dataset entropy and effective losses, comparing recurrent, two-function, and independent memorization algorithms. Findings indicate two-hop QA is best explained by a two-function composition memory strategy with capacity near $2$ bits per parameter, while chain-of-thought reasoning greatly improves efficiency; probing methods, however, provide weaker signals. The work demonstrates that information content scaling can complement traditional interpretability techniques, though applying it broadly faces practical challenges and some results depend on dataset and hyperparameter choices. Overall, the study clarifies algorithmic distinctions in two-hop reasoning and highlights the nuanced relationship between capacity, generalization, and interpretability in transformers.

Abstract

Prior work has found that transformers have an inconsistent ability to learn to answer latent two-hop questions -- questions of the form "Who is Bob's mother's boss?" We study why this is the case by examining how transformers' capacity to learn datasets of two-hop questions and answers (two-hop QA) scales with their size, motivated by prior work on transformer knowledge capacity for simple factual memorization. We find that capacity scaling and generalization both support the hypothesis that latent two-hop QA requires transformers to learn each fact twice, while two-hop QA with chain of thought does not. We also show that with appropriate dataset parameters, it is possible to "trap" very small models in a regime where they memorize answers to two-hop questions independently, even though they would perform better if they could learn to answer them with function composition. Our findings show that measurement of capacity scaling can complement existing interpretability methods, though there are challenges in using it for this purpose.

Examining Two Hop Reasoning Through Information Content Scaling

TL;DR

The paper addresses why latent two-hop reasoning in transformers is hard by introducing information content scaling as a quantitative interpretability tool. It builds synthetic two-hop QA datasets, trains Transformer models across sizes with muP, and measures capacity via dataset entropy and effective losses, comparing recurrent, two-function, and independent memorization algorithms. Findings indicate two-hop QA is best explained by a two-function composition memory strategy with capacity near bits per parameter, while chain-of-thought reasoning greatly improves efficiency; probing methods, however, provide weaker signals. The work demonstrates that information content scaling can complement traditional interpretability techniques, though applying it broadly faces practical challenges and some results depend on dataset and hyperparameter choices. Overall, the study clarifies algorithmic distinctions in two-hop reasoning and highlights the nuanced relationship between capacity, generalization, and interpretability in transformers.

Abstract

Prior work has found that transformers have an inconsistent ability to learn to answer latent two-hop questions -- questions of the form "Who is Bob's mother's boss?" We study why this is the case by examining how transformers' capacity to learn datasets of two-hop questions and answers (two-hop QA) scales with their size, motivated by prior work on transformer knowledge capacity for simple factual memorization. We find that capacity scaling and generalization both support the hypothesis that latent two-hop QA requires transformers to learn each fact twice, while two-hop QA with chain of thought does not. We also show that with appropriate dataset parameters, it is possible to "trap" very small models in a regime where they memorize answers to two-hop questions independently, even though they would perform better if they could learn to answer them with function composition. Our findings show that measurement of capacity scaling can complement existing interpretability methods, though there are challenges in using it for this purpose.

Paper Structure

This paper contains 22 sections, 17 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Observed information content scaling on one-hop questions for 4 layer transformers. A loss of zero implies that the information content is equal to the dataset entropy, and a loss equal to predicting the uniform distribution over all answers in the dataset yields the information content level represented by "baseline" (this is nontrivial due to needing to learn the set of names in the dataset out of all possible names). The dashed black line represents a content of 2 bits per parameter.
  • Figure 2: Observed information content scaling on two-hop questions without chain-of-thought generation for 4 layer transformers, assuming the two-function composition computational model. The dataset entropy and baseline curves are not quite constant due to randomly held out attributes differing between datasets.
  • Figure 3: Observed information content scaling on two-hop questions with chain-of-thought generation for 4 layer transformers. The information content exceeds the capacity estimate for two function composition and approaches the capacity for recurrent composition, expected.
  • Figure 4: The generalization gap for two-hop question answering. In most cases the gap increased as the parameter count increased. Models trained on the 1000 profile dataset were an exception, with the largest model on this dataset achieving nearly 0 loss on both the train and evaluation sets.
  • Figure 5: Anomalously low information content for models trained with only 4 relations. The estimated information content approximates the capacity curve if we assume independent memorization of all two hop questions.
  • ...and 4 more figures