Table of Contents
Fetching ...

Probing BERT for German Compound Semantics

Filip Miletić, Aaron Schmid, Sabine Schulte im Walde

TL;DR

The paper investigates whether pretrained German BERT encodes noun compound semantics by evaluating compositionality predictions for 868 German noun-noun compounds. It adapts an English probing framework, using a span of BERT-derived compositionality estimates across target embeddings, layer spans, and cased vs. uncased models, grounded in the GHoSt-NN dataset and the DECOW corpus. The strongest results reach $\rho=0.433$ for heads (and $\rho=0.332$ for modifiers), but overall German performance lags English, likely due to higher German compounding productivity and constituent ambiguity; early transformer layers are consistently pivotal. The findings reveal cross-lingual parallels in how BERT encodes compositionality and motivate broader cross-language probing with varied languages and model variants to better understand language-specific effects.

Abstract

This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.

Probing BERT for German Compound Semantics

TL;DR

The paper investigates whether pretrained German BERT encodes noun compound semantics by evaluating compositionality predictions for 868 German noun-noun compounds. It adapts an English probing framework, using a span of BERT-derived compositionality estimates across target embeddings, layer spans, and cased vs. uncased models, grounded in the GHoSt-NN dataset and the DECOW corpus. The strongest results reach for heads (and for modifiers), but overall German performance lags English, likely due to higher German compounding productivity and constituent ambiguity; early transformer layers are consistently pivotal. The findings reveal cross-lingual parallels in how BERT encodes compositionality and motivate broader cross-language probing with varied languages and model variants to better understand language-specific effects.

Abstract

This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.

Paper Structure

This paper contains 16 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Mean performance across contiguous spans of layers, defined by the start layer (x-axis) and end layer (y-axis). Left: uncased model; right: cased model. Top: modifier predictions; bottom: head predictions.
  • Figure 2: Layer-wise difference in cased vs. uncased model performance. Positive values: better performance of the cased model. Negative values: better performance of the uncased model.