Table of Contents
Fetching ...

Linguistically Grounded Analysis of Language Models using Shapley Head Values

Marcell Fekete, Johannes Bjerva

TL;DR

Understanding how morphosyntactic knowledge is encoded in language models is addressed by applying $SHVs$ to identify attention-head subnetworks in $BERT$ and $RoBERTa$ as they process BLiMP morphosyntax constructs. The authors derive $SHVs$ via gating and permutation-based marginal contributions, cluster paradigms by SHV profiles, and validate clusters with pruning that reveals localized subnetworks corresponding to linguistic categories. Key findings show substantial cross-model cluster consistency (6 of 10), alignment with categorical morphosyntactic phenomena (e.g., NPI licensing, Binding), and varying locality across models, with RoBERTa generally more discriminative and locality-focused than BERT. The work advances interpretable NLP by grounding attribution in linguistic theory, offering a linguistically meaningful lens for cross-linguistic model analysis and interpretability, and providing an implementation for SHV-based probing and pruning.

Abstract

Understanding how linguistic knowledge is encoded in language models is crucial for improving their generalisation capabilities. In this paper, we investigate the processing of morphosyntactic phenomena, by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs). Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions such as anaphor agreement and filler-gap dependencies are handled. Through quantitative pruning and qualitative clustering analysis, we demonstrate that attention heads responsible for processing related linguistic phenomena cluster together. Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information. These findings support the hypothesis that language models learn subnetworks corresponding to linguistic theory, with potential implications for cross-linguistic model analysis and interpretability in Natural Language Processing (NLP).

Linguistically Grounded Analysis of Language Models using Shapley Head Values

TL;DR

Understanding how morphosyntactic knowledge is encoded in language models is addressed by applying to identify attention-head subnetworks in and as they process BLiMP morphosyntax constructs. The authors derive via gating and permutation-based marginal contributions, cluster paradigms by SHV profiles, and validate clusters with pruning that reveals localized subnetworks corresponding to linguistic categories. Key findings show substantial cross-model cluster consistency (6 of 10), alignment with categorical morphosyntactic phenomena (e.g., NPI licensing, Binding), and varying locality across models, with RoBERTa generally more discriminative and locality-focused than BERT. The work advances interpretable NLP by grounding attribution in linguistic theory, offering a linguistically meaningful lens for cross-linguistic model analysis and interpretability, and providing an implementation for SHV-based probing and pruning.

Abstract

Understanding how linguistic knowledge is encoded in language models is crucial for improving their generalisation capabilities. In this paper, we investigate the processing of morphosyntactic phenomena, by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs). Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions such as anaphor agreement and filler-gap dependencies are handled. Through quantitative pruning and qualitative clustering analysis, we demonstrate that attention heads responsible for processing related linguistic phenomena cluster together. Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information. These findings support the hypothesis that language models learn subnetworks corresponding to linguistic theory, with potential implications for cross-linguistic model analysis and interpretability in Natural Language Processing (NLP).

Paper Structure

This paper contains 31 sections, 4 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: In the fine-tuning step, we train a classifier on a grammaticality judgement task. We carry out Shapley Head Value (SHV) attributions, and in the interpretation step, we carry out quantitative analysis using pruning as well as qualitative experiments using linguistic grounding.
  • Figure 2: Clustering is done to try to optimise inertia and cluster count, resulting in an attempt at 10 clusters.
  • Figure 3: Mean $\Delta$ accuracy values drop in a near-linear fashion when pruning up to the top $n$ heads across paradigms
  • Figure 4: Distribution of baseline accuracy levels without pruning across the BERT and RoBERTa models.
  • Figure 5: Impact in terms of $\Delta$ in accuracy in-cluster versus out-of-cluster across six BERT clusters when pruning the top 10 attention heads. Asterisks (*) show where the difference in distribution of the delta values is significant at $\alpha\leq0.001$ after applying Bonferroni correction dunn_multiple_1961 (see Appendix \ref{['sec:appendix_ttest']}).