Table of Contents
Fetching ...

France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions

Sasha Boguraev, Qing Yao, Kyle Mahowald

TL;DR

It is argued that this neural explanation sheds light on the mechanisms underlying context-sensitive semantic interpretation, and that it complements existing symbolic analyses.

Abstract

Sentences like "She will go to France or Spain, or perhaps to Germany or France." appear formally redundant, yet become acceptable in contexts such as "Mary will go to a philosophy program in France or Spain, or a mathematics program in Germany or France." While this phenomenon has typically been analyzed using symbolic formal representations, we aim to provide a complementary account grounded in artificial neural mechanisms. We first present new behavioral evidence from humans and large language models demonstrating the robustness of this apparent non-redundancy across contexts. We then show that, in language models, redundancy avoidance arises from two interacting mechanisms: models learn to bind contextually relevant information to repeated lexical items, and Transformer induction heads selectively attend to these context-licensed representations. We argue that this neural explanation sheds light on the mechanisms underlying context-sensitive semantic interpretation, and that it complements existing symbolic analyses.

France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions

TL;DR

It is argued that this neural explanation sheds light on the mechanisms underlying context-sensitive semantic interpretation, and that it complements existing symbolic analyses.

Abstract

Sentences like "She will go to France or Spain, or perhaps to Germany or France." appear formally redundant, yet become acceptable in contexts such as "Mary will go to a philosophy program in France or Spain, or a mathematics program in Germany or France." While this phenomenon has typically been analyzed using symbolic formal representations, we aim to provide a complementary account grounded in artificial neural mechanisms. We first present new behavioral evidence from humans and large language models demonstrating the robustness of this apparent non-redundancy across contexts. We then show that, in language models, redundancy avoidance arises from two interacting mechanisms: models learn to bind contextually relevant information to repeated lexical items, and Transformer induction heads selectively attend to these context-licensed representations. We argue that this neural explanation sheds light on the mechanisms underlying context-sensitive semantic interpretation, and that it complements existing symbolic analyses.
Paper Structure (27 sections, 4 figures)

This paper contains 27 sections, 4 figures.

Figures (4)

  • Figure 1: In context, our critical sentences become non-redundant and large LMs reliably produce the repeated item; in the control condition, they suppress the copy mechanism.
  • Figure 2: Left: We broadly see LM competency, with high rate of the repeated answer (X) as top choice. Right: We also see order preferences in LMs. Namely in small models we see a strong preference towards a match in the ordering of the second disjunction, but as models get larger this preference gets superseded by a preference for total matching between S1 and S2.
  • Figure 3: We perform activation patching across models and find disjunctive elements to be bound in different contexts.
  • Figure 4: Average attention patterns of top-9 induction heads on the control condition and each of the critical conditions.