Table of Contents
Fetching ...

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller

TL;DR

The paper investigates whether large language models develop distinct, interpretable mechanisms for processing hierarchical versus linear grammars, independent of human biases. It employs a broad suite of open-weight LLMs and a comprehensive set of grammars across English, Italian, Japanese, plus nonce-word Jabberwocky variants, in a four-experiment program including behavioral comparisons, mechanistic localization, causal ablations, and cross-domain tests. Key findings show that hierarchy-sensitive processing is largely separable from linearity-sensitive processing, evidenced by distinct component overlaps and selective ablations, and that hierarchy sensitivity persists even with nonce inputs, suggesting abstract, meaning-independent mechanisms. These results imply that functional specialization toward hierarchical linguistic structure can arise from exposure to language data alone, with implications for mechanistic interpretability and the understanding of syntax in AI systems.

Abstract

All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs.

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

TL;DR

The paper investigates whether large language models develop distinct, interpretable mechanisms for processing hierarchical versus linear grammars, independent of human biases. It employs a broad suite of open-weight LLMs and a comprehensive set of grammars across English, Italian, Japanese, plus nonce-word Jabberwocky variants, in a four-experiment program including behavioral comparisons, mechanistic localization, causal ablations, and cross-domain tests. Key findings show that hierarchy-sensitive processing is largely separable from linearity-sensitive processing, evidenced by distinct component overlaps and selective ablations, and that hierarchy sensitivity persists even with nonce inputs, suggesting abstract, meaning-independent mechanisms. These results imply that functional specialization toward hierarchical linguistic structure can arise from exposure to language data alone, with implications for mechanistic interpretability and the understanding of syntax in AI systems.

Abstract

All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs.
Paper Structure (35 sections, 2 equations, 15 figures, 12 tables)

This paper contains 35 sections, 2 equations, 15 figures, 12 tables.

Figures (15)

  • Figure 1: Few-shot accuracy on the grammaticality judgment task on hierarchical and linear inputs. On average, all models are better at the grammaticality judgment task on hierarchical inputs as compared to linear inputs. On hierarchical grammars, models are best at processing English inputs followed by Italian and Japanese. Model-wise accuracy on this task is shown in Figure \ref{['fig:expt1-model-wise-bars']} in App. \ref{['appendix:expt-1']}. Grammar-wise accuracy is shown in Table \ref{['tab:expt1-model-accuracies-conv']} in App. \ref{['appendix:expt-1']}.
  • Figure 2: Mean pairwise overlap percentage of the top 1% of neurons from hierarchical (H) or linear (L) grammars. We show means across models (error bars are standard errors); see Figure \ref{['fig:expt2-model-wise-overlaps-en-it-ja']} in App. \ref{['appendix:exp2']} for model-wise results. Overlaps are significantly (p $<$ 0.001, Table \ref{['tab:expt2-stat-sig-conv']}) different between hierarchical-hierarchical pairs and linear-linear pairs, and between hierarchical-hierarchical pairs and hierarchical-linear pairs.
  • Figure 3: Mean relative change in accuracy across models (error bars are standard errors) after ablating the top 1% of neurons from hierarchical (H) or linear (L) grammars. We compare to a random ablation baseline. For model-wise ablations, see Figure \ref{['fig:expt3-model-ablations-en-it-ja']} in App. \ref{['appendix:exp3']}.
  • Figure 4: Results on Jabberwocky grammars. We show grammaticality judgment task performance (a), mean neuron overlap percentages between Jabberwocky hierarchical and linear grammars (b), neuron overlaps between English and Jabberwocky grammars (c), and the mean relative changes in accuracy as measured on Jabberwocky grammars after ablating top 1% of neurons corresponding to English grammars (d). See App. \ref{['appendix:exp4']} for model-wise results.
  • Figure 5: Experiment 1. Model-wise accuracy on the grammaticality judgments task given hierarchical and linear inputs from English, Italian and Japanese(See § \ref{['sec:exp1']} and Tables \ref{['tab:expt1-model-accuracies-conv']} and \ref{['tab:all-template-examples']})
  • ...and 10 more figures