Table of Contents
Fetching ...

Probing structural constraints of negation in Pretrained Language Models

David Kletz, Marie Candito, Pascal Amsili

TL;DR

The paper addresses how pretrained language models encode negation and its formal impact, focusing on negation scope and NPI licensing. It adopts probe-based methods across BERT and RoBERTa variants, evaluating whether contextual embeddings predict the presence of not and the polarity of masked NPIs. Findings show that embeddings inside the negation scope better support both not presence and NPI polarity predictions, yet deeper analyses reveal that these effects largely reflect general clause-boundary knowledge, with distance to the negation clue being a strong factor and model-dependent magnitudes. The work highlights that while PLMs capture some structural constraints related to negation, their behavior aligns closely with syntactic boundaries, informing how negation is interpreted in NLP systems and guiding further investigations into linguistic structure in PLMs.

Abstract

Contradictory results about the encoding of the semantic impact of negation in pretrained language models (PLMs). have been drawn recently (e.g. Kassner and Sch{ü}tze (2020); Gubelmann and Handschuh (2022)). In this paper we focus rather on the way PLMs encode negation and its formal impact, through the phenomenon of the Negative Polarity Item (NPI) licensing in English. More precisely, we use probes to identify which contextual representations best encode 1) the presence of negation in a sentence, and 2) the polarity of a neighboring masked polarity item. We find that contextual representations of tokens inside the negation scope do allow for (i) a better prediction of the presence of not compared to those outside the scope and (ii) a better prediction of the right polarity of a masked polarity item licensed by not, although the magnitude of the difference varies from PLM to PLM. Importantly, in both cases the trend holds even when controlling for distance to not. This tends to indicate that the embeddings of these models do reflect the notion of negation scope, and do encode the impact of negation on NPI licensing. Yet, further control experiments reveal that the presence of other lexical items is also better captured when using the contextual representation of a token within the same syntactic clause than outside from it, suggesting that PLMs simply capture the more general notion of syntactic clause.

Probing structural constraints of negation in Pretrained Language Models

TL;DR

The paper addresses how pretrained language models encode negation and its formal impact, focusing on negation scope and NPI licensing. It adopts probe-based methods across BERT and RoBERTa variants, evaluating whether contextual embeddings predict the presence of not and the polarity of masked NPIs. Findings show that embeddings inside the negation scope better support both not presence and NPI polarity predictions, yet deeper analyses reveal that these effects largely reflect general clause-boundary knowledge, with distance to the negation clue being a strong factor and model-dependent magnitudes. The work highlights that while PLMs capture some structural constraints related to negation, their behavior aligns closely with syntactic boundaries, informing how negation is interpreted in NLP systems and guiding further investigations into linguistic structure in PLMs.

Abstract

Contradictory results about the encoding of the semantic impact of negation in pretrained language models (PLMs). have been drawn recently (e.g. Kassner and Sch{ü}tze (2020); Gubelmann and Handschuh (2022)). In this paper we focus rather on the way PLMs encode negation and its formal impact, through the phenomenon of the Negative Polarity Item (NPI) licensing in English. More precisely, we use probes to identify which contextual representations best encode 1) the presence of negation in a sentence, and 2) the polarity of a neighboring masked polarity item. We find that contextual representations of tokens inside the negation scope do allow for (i) a better prediction of the presence of not compared to those outside the scope and (ii) a better prediction of the right polarity of a masked polarity item licensed by not, although the magnitude of the difference varies from PLM to PLM. Importantly, in both cases the trend holds even when controlling for distance to not. This tends to indicate that the embeddings of these models do reflect the notion of negation scope, and do encode the impact of negation on NPI licensing. Yet, further control experiments reveal that the presence of other lexical items is also better captured when using the contextual representation of a token within the same syntactic clause than outside from it, suggesting that PLMs simply capture the more general notion of syntactic clause.
Paper Structure (18 sections, 6 figures, 10 tables)

This paper contains 18 sections, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Accuracy of the ROBERTA-large-neg-classifier (average on 3 runs) on the not+NPI test set, broken down by zone (colors of the bars) and by relative position to not (horizontal axis). Further distances are omitted for clarity. No licensing scope contains less than 2 tokens, hence positions 1 and 2 are always in the IN zone. The bar differences at each position and run are statistically significant at $p<0.001$ (cf. Appendix \ref{['app:significance']}). Figures for the other 3 models are provided in appendix figure \ref{['fig:accuracy_other_neg_classifiers']}.
  • Figure 2: Illustration of the training of the pol-classifiers.
  • Figure 3: Accuracy of the ROBERTA-large-pol-classifier (average on 3 runs) on the not+NPI test set, broken down by zone (colors of the bars) and by relative position to not (horizontal axis). Further distances are omitted for clarity. No licensing scope contains less than 2 tokens, hence positions 1 and 2 are always in the IN zone. The bar differences at each position and run are statistically significant at $p<0.001$ (cf. appendix figures \ref{['fig:accuracy_other_pol_classifiers']}).
  • Figure 4: Accuracy (average on 3 runs) of the other neg-classifiers (BERT-base, BERT-large and ROBERTA-base) on the not+NPI test set, broken down by zone (colors of the bars) and by relative position to not (horizontal axis). Further distances are omitted for clarity. No licensing scope contains less than 2 tokens, hence positions 1 and 2 are always in the IN zone. The bar differences at each position and run are statistically significant at $p<0.001$ (cf. Appendix \ref{['app:significance']}).
  • Figure 5: Accuracy (average on 3 runs) of the other pol-classifiers (BERT-base, BERT-large and ROBERTA-base) on the not+NPI test set, broken down by zone (colors of the bars) and by relative position to not (horizontal axis). Further distances are omitted for clarity. No licensing scope contains less than 2 tokens, hence positions 1 and 2 are always in the IN zone. The bar differences at each position and run are statistically significant at $p<0.001$ (cf. Appendix \ref{['app:significance']}).
  • ...and 1 more figures