Are there identifiable structural parts in the sentence embedding whole?

Vivi Nastase; Paola Merlo

Are there identifiable structural parts in the sentence embedding whole?

Vivi Nastase, Paola Merlo

TL;DR

The paper investigates whether sentence embeddings from transformer models harbor identifiable structural components. It treats the embedding as overlapping information layers and uses a CNN-based separation plus a VAE framework to extract chunk structure and semantic-role information from raw embeddings. The authors demonstrate near-perfect chunk identification and show that chunk information can be leveraged in BLM-like reasoning tasks via a two-level VAE, providing evidence for separable, structure-relevant information in fixed-length embeddings. This work advances understanding of how linguistic structure is encoded in transformers and suggests pathways to build more robust, structure-aware language models.

Abstract

Sentence embeddings from transformer models encode in a fixed length vector much linguistic information. We explore the hypothesis that these embeddings consist of overlapping layers of information that can be separated, and on which specific types of information -- such as information about chunks and their structural and semantic properties -- can be detected. We show that this is the case using a dataset consisting of sentences with known chunk structure, and two linguistic intelligence datasets, solving which relies on detecting chunks and their grammatical number, and respectively, their semantic roles, and through analyses of the performance on the tasks and of the internal representations built during learning.

Are there identifiable structural parts in the sentence embedding whole?

TL;DR

Abstract

Paper Structure (31 sections, 14 figures, 5 tables)

This paper contains 31 sections, 14 figures, 5 tables.

Introduction
Related work
Tracing information through a transformer
Word embeddings
Probing models
Data
Data
Sentences
Blackbird Language Matrices
Datasets statistics
Experiments
Parts in sentences
Experimental set-up
Analysis
Electra vs. BERT and RoBERTa, and the price of fine-tuning
...and 16 more sections

Figures (14)

Figure 1: Structure of two BLM problems, in terms of chunks in sentences and sequence structure.
Figure 2: Chunk identification results: tSNE projections of the latent vectors for the French dataset, and confusion matrix of the system output. The results for English are similar.
Figure 3: The impact on reconstructing sentences with the same pattern when modifying the latent layer with values in their respective min-max range (based on the training data) -- sample confusion matrices.
Figure 4: A two-level VAE: the sentence level learns to compress a sentence into a representation useful to solve the BLM problem on the task level.
Figure 5: VAE vs 2-level VAE (2xVAE) on the agreement BLM problem
...and 9 more figures

Are there identifiable structural parts in the sentence embedding whole?

TL;DR

Abstract

Are there identifiable structural parts in the sentence embedding whole?

Authors

TL;DR

Abstract

Table of Contents

Figures (14)