Hitting "Probe"rty with Non-Linearity, and More
Avik Pal, Madhura Pawar
TL;DR
This work investigates how dependency syntax is encoded in transformer hidden states by moving beyond linear structural probes to non-linear variants, with a focus on BERT and BERT$_{LARGE}$. It introduces a simplified non-linear probing framework and a qualitative visualization of word-pair connectivity, finding that the Radial Basis Function (RBF) probe most effectively captures syntactic structure across layers. Through UUAS evaluation and strength-based visualizations, the study reveals layer-wise dynamics of syntactic encoding and demonstrates that deeper layers tend to specialize for task-relevant syntax, while context is crucial for capturing these relations. The results underscore the value of non-linear probing and edge-strength analyses for understanding how transformers encode syntax, suggesting improvements to evaluation beyond UUAS and guiding future explorations of layer-wise linguistic representations.
Abstract
Structural probes learn a linear transformation to find how dependency trees are embedded in the hidden states of language models. This simple design may not allow for full exploitation of the structure of the encoded information. Hence, to investigate the structure of the encoded information to its full extent, we incorporate non-linear structural probes. We reformulate the design of non-linear structural probes introduced by White et al. making its design simpler yet effective. We also design a visualization framework that lets us qualitatively assess how strongly two words in a sentence are connected in the predicted dependency tree. We use this technique to understand which non-linear probe variant is good at encoding syntactical information. Additionally, we also use it to qualitatively investigate the structure of dependency trees that BERT encodes in each of its layers. We find that the radial basis function (RBF) is an effective non-linear probe for the BERT model than the linear probe.
