Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification
Vivi Nastase, Paola Merlo
TL;DR
This work addresses how linguistic information is encoded in transformer-based sentence embeddings by testing targeted sparsification of a CNN-based encoder that compresses sentences into a latent vector of length $5$ and reshapes embeddings to $32\times24$. By enforcing disjoint connections from CNN channels to latent units and tracing signals back to embedding regions, the authors localize chunk-structure information (noun, verb, prepositional phrases) to specific small regions of the embedding. The approach preserves task performance on chunk-focused problems (Blackbird Language Matrices) with only modest drops under sparsification, and reveals that chunk information can be recovered and localized in the bottom portions of the embedding. These findings advance explainability of transformer sentence representations and suggest concrete directions for constructing interpretable neural models in structured linguistic tasks, using a two-level VAE architecture and targeted locality analyses.
Abstract
Analyses of transformer-based models have shown that they encode a variety of linguistic information from their textual input. While these analyses have shed a light on the relation between linguistic information on one side, and internal architecture and parameters on the other, a question remains unanswered: how is this linguistic information reflected in sentence embeddings? Using datasets consisting of sentences with known structure, we test to what degree information about chunks (in particular noun, verb or prepositional phrases), such as grammatical number, or semantic role, can be localized in sentence embeddings. Our results show that such information is not distributed over the entire sentence embedding, but rather it is encoded in specific regions. Understanding how the information from an input text is compressed into sentence embeddings helps understand current transformer models and help build future explainable neural models.
