Table of Contents
Fetching ...

Hitting "Probe"rty with Non-Linearity, and More

Avik Pal, Madhura Pawar

TL;DR

This work investigates how dependency syntax is encoded in transformer hidden states by moving beyond linear structural probes to non-linear variants, with a focus on BERT and BERT$_{LARGE}$. It introduces a simplified non-linear probing framework and a qualitative visualization of word-pair connectivity, finding that the Radial Basis Function (RBF) probe most effectively captures syntactic structure across layers. Through UUAS evaluation and strength-based visualizations, the study reveals layer-wise dynamics of syntactic encoding and demonstrates that deeper layers tend to specialize for task-relevant syntax, while context is crucial for capturing these relations. The results underscore the value of non-linear probing and edge-strength analyses for understanding how transformers encode syntax, suggesting improvements to evaluation beyond UUAS and guiding future explorations of layer-wise linguistic representations.

Abstract

Structural probes learn a linear transformation to find how dependency trees are embedded in the hidden states of language models. This simple design may not allow for full exploitation of the structure of the encoded information. Hence, to investigate the structure of the encoded information to its full extent, we incorporate non-linear structural probes. We reformulate the design of non-linear structural probes introduced by White et al. making its design simpler yet effective. We also design a visualization framework that lets us qualitatively assess how strongly two words in a sentence are connected in the predicted dependency tree. We use this technique to understand which non-linear probe variant is good at encoding syntactical information. Additionally, we also use it to qualitatively investigate the structure of dependency trees that BERT encodes in each of its layers. We find that the radial basis function (RBF) is an effective non-linear probe for the BERT model than the linear probe.

Hitting "Probe"rty with Non-Linearity, and More

TL;DR

This work investigates how dependency syntax is encoded in transformer hidden states by moving beyond linear structural probes to non-linear variants, with a focus on BERT and BERT. It introduces a simplified non-linear probing framework and a qualitative visualization of word-pair connectivity, finding that the Radial Basis Function (RBF) probe most effectively captures syntactic structure across layers. Through UUAS evaluation and strength-based visualizations, the study reveals layer-wise dynamics of syntactic encoding and demonstrates that deeper layers tend to specialize for task-relevant syntax, while context is crucial for capturing these relations. The results underscore the value of non-linear probing and edge-strength analyses for understanding how transformers encode syntax, suggesting improvements to evaluation beyond UUAS and guiding future explorations of layer-wise linguistic representations.

Abstract

Structural probes learn a linear transformation to find how dependency trees are embedded in the hidden states of language models. This simple design may not allow for full exploitation of the structure of the encoded information. Hence, to investigate the structure of the encoded information to its full extent, we incorporate non-linear structural probes. We reformulate the design of non-linear structural probes introduced by White et al. making its design simpler yet effective. We also design a visualization framework that lets us qualitatively assess how strongly two words in a sentence are connected in the predicted dependency tree. We use this technique to understand which non-linear probe variant is good at encoding syntactical information. Additionally, we also use it to qualitatively investigate the structure of dependency trees that BERT encodes in each of its layers. We find that the radial basis function (RBF) is an effective non-linear probe for the BERT model than the linear probe.
Paper Structure (14 sections, 8 equations, 7 figures)

This paper contains 14 sections, 8 equations, 7 figures.

Figures (7)

  • Figure 1: UUAS across BERT and BERT$_\text{LARGE}$ layers
  • Figure 2: Dependency trees for BERT Layer 12 by Linear Probe (above) and RBF Probe (below). Edges in black are the gold trees.
  • Figure 3: Dependency trees from predicted squared distances on BERT by RBF Probe for layers $\in \{3,6,12\}$
  • Figure 4: UUAS Scores for varying rank of matrix $B$ where rank $\in \{1, 2, 4, 8, 16, 32, 64, 128, 256 \}$
  • Figure 5: UUAS Scores for non-contextualized representations
  • ...and 2 more figures