Probing BERT for German Compound Semantics
Filip Miletić, Aaron Schmid, Sabine Schulte im Walde
TL;DR
The paper investigates whether pretrained German BERT encodes noun compound semantics by evaluating compositionality predictions for 868 German noun-noun compounds. It adapts an English probing framework, using a span of BERT-derived compositionality estimates across target embeddings, layer spans, and cased vs. uncased models, grounded in the GHoSt-NN dataset and the DECOW corpus. The strongest results reach $\rho=0.433$ for heads (and $\rho=0.332$ for modifiers), but overall German performance lags English, likely due to higher German compounding productivity and constituent ambiguity; early transformer layers are consistently pivotal. The findings reveal cross-lingual parallels in how BERT encodes compositionality and motivate broader cross-language probing with varied languages and model variants to better understand language-specific effects.
Abstract
This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.
