Table of Contents
Fetching ...

What do Transformers Know about Government?

Jue Hou, Anisia Katinskaia, Lari Kotilainen, Sathianpong Trangcasanchai, Anh-Duc Vu, Roman Yangarber

TL;DR

The paper investigates how transformer encodings encode linguistic government relations, using BERT probes on Finnish and Russian and introducing the Government Bank dataset. It shows that government information is embedded across transformer layers, with strongest signals in the early layers, and that a small subset of attention heads can support high-accuracy prediction and the discovery of unseen government patterns. By training multiple probing classifiers on attention-head weights, the authors demonstrate practical utility for expanding linguistic resources and potentially aiding language learning and construction grammatical research. The work also discusses data limitations, error sources from parsers, and future work to broaden language coverage, models, and types of constructions beyond government.

Abstract

This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models.In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show that information about government is encoded across all transformer layers, but predominantly in the early layers of the model. We find that, for both languages, a small number of attention heads encode enough information about the government relations to enable us to train a classifier capable of discovering new, previously unknown types of government, never seen in the training data. Currently, data is lacking for the research community working on grammatical constructions, and government in particular. We release the Government Bank -- a dataset defining the government relations for thousands of lemmas in the languages in our experiments.

What do Transformers Know about Government?

TL;DR

The paper investigates how transformer encodings encode linguistic government relations, using BERT probes on Finnish and Russian and introducing the Government Bank dataset. It shows that government information is embedded across transformer layers, with strongest signals in the early layers, and that a small subset of attention heads can support high-accuracy prediction and the discovery of unseen government patterns. By training multiple probing classifiers on attention-head weights, the authors demonstrate practical utility for expanding linguistic resources and potentially aiding language learning and construction grammatical research. The work also discusses data limitations, error sources from parsers, and future work to broaden language coverage, models, and types of constructions beyond government.

Abstract

This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models.In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show that information about government is encoded across all transformer layers, but predominantly in the early layers of the model. We find that, for both languages, a small number of attention heads encode enough information about the government relations to enable us to train a classifier capable of discovering new, previously unknown types of government, never seen in the training data. Currently, data is lacking for the research community working on grammatical constructions, and government in particular. We release the Government Bank -- a dataset defining the government relations for thousands of lemmas in the languages in our experiments.
Paper Structure (17 sections, 6 figures, 9 tables)

This paper contains 17 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Weights of attention heads of transformer LM as input to probing classifier.
  • Figure 2: t-SNE visualization of positive vs. negative instances. (left: Finnish, right: Russian)
  • Figure 3: Probing government prediction with attention weights from the first N layers of BERT (X-axis) when $Dist>3$. Y-axis---$F_1$ measure. (left: Finnish, right: Russian)
  • Figure 4: $F_1$ score for Random forest classifier with selected attention heads when $Dist>3$ (left: Finnish, right: Russian).
  • Figure 5: Probing government prediction with attention weights from the first N layers of BERT (X-axis) when $Dist>2$. Y-axis---$F_1$ measure. (left: Finnish, right: Russian)
  • ...and 1 more figures