Table of Contents
Fetching ...

Probing the Category of Verbal Aspect in Transformer Language Models

Anisia Katinskaia, Roman Yangarber

TL;DR

This work examines how transformer-based language models encode the Russian verbal aspect, focusing on perfective vs. imperfective and the complicating factor of alternative contexts. It employs behavioral probing (iterative masking and aspect inference) and causal probing (AlterRep-INLP counterfactuals) to reveal that aspect information is primarily captured in the final layers and that boundedness semantics modulates aspect predictions in a way consistent with grammatical theory. The authors further show that fine-tuning only the last layers yields faster, more effective aspect prediction, while alternative contexts induce higher predictive uncertainty and stronger sensitivity to semantic interventions. The study provides a data-rich, linguistically informed methodology and release, with implications for morphologically rich languages and educational tools that rely on robust aspect prediction in context.

Abstract

We investigate how pretrained language models (PLM) encode the grammatical category of verbal aspect in Russian. Encoding of aspect in transformer LMs has not been studied previously in any language. A particular challenge is posed by "alternative contexts": where either the perfective or the imperfective aspect is suitable grammatically and semantically. We perform probing using BERT and RoBERTa on alternative and non-alternative contexts. First, we assess the models' performance on aspect prediction, via behavioral probing. Next, we examine the models' performance when their contextual representations are substituted with counterfactual representations, via causal probing. These counterfactuals alter the value of the "boundedness" feature--a semantic feature, which characterizes the action in the context. Experiments show that BERT and RoBERTa do encode aspect--mostly in their final layers. The counterfactual interventions affect perfective and imperfective in opposite ways, which is consistent with grammar: perfective is positively affected by adding the meaning of boundedness, and vice versa. The practical implications of our probing results are that fine-tuning only the last layers of BERT on predicting aspect is faster and more effective than fine-tuning the whole model. The model has high predictive uncertainty about aspect in alternative contexts, which tend to lack explicit hints about the boundedness of the described action.

Probing the Category of Verbal Aspect in Transformer Language Models

TL;DR

This work examines how transformer-based language models encode the Russian verbal aspect, focusing on perfective vs. imperfective and the complicating factor of alternative contexts. It employs behavioral probing (iterative masking and aspect inference) and causal probing (AlterRep-INLP counterfactuals) to reveal that aspect information is primarily captured in the final layers and that boundedness semantics modulates aspect predictions in a way consistent with grammatical theory. The authors further show that fine-tuning only the last layers yields faster, more effective aspect prediction, while alternative contexts induce higher predictive uncertainty and stronger sensitivity to semantic interventions. The study provides a data-rich, linguistically informed methodology and release, with implications for morphologically rich languages and educational tools that rely on robust aspect prediction in context.

Abstract

We investigate how pretrained language models (PLM) encode the grammatical category of verbal aspect in Russian. Encoding of aspect in transformer LMs has not been studied previously in any language. A particular challenge is posed by "alternative contexts": where either the perfective or the imperfective aspect is suitable grammatically and semantically. We perform probing using BERT and RoBERTa on alternative and non-alternative contexts. First, we assess the models' performance on aspect prediction, via behavioral probing. Next, we examine the models' performance when their contextual representations are substituted with counterfactual representations, via causal probing. These counterfactuals alter the value of the "boundedness" feature--a semantic feature, which characterizes the action in the context. Experiments show that BERT and RoBERTa do encode aspect--mostly in their final layers. The counterfactual interventions affect perfective and imperfective in opposite ways, which is consistent with grammar: perfective is positively affected by adding the meaning of boundedness, and vice versa. The practical implications of our probing results are that fine-tuning only the last layers of BERT on predicting aspect is faster and more effective than fine-tuning the whole model. The model has high predictive uncertainty about aspect in alternative contexts, which tend to lack explicit hints about the boundedness of the described action.
Paper Structure (20 sections, 1 equation, 14 figures, 2 tables)

This paper contains 20 sections, 1 equation, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Performance of BERT-large on iterative masking (left) and aspect inference (right) for target verbs. Perf and Imp denote perfective and imperfective aspect in non-alternative (NonAlt) and alternative (Alt) contexts. Black dotted lines indicate random guessing between perfective and imperfective.
  • Figure 2: Causal model of dependencies between intended meaning (M) of instance, lemma of target verb (L), context (C), choice of aspect (A), and contextual representation (R) of target verb.
  • Figure 3: Percentage of sentences with detected cue words where target verb is perfective or imperfective.
  • Figure 4: Accuracy of predicting correct (expected) aspect, using aspect inference method after intervention on BERT-large representations. Top plots---non-alternative contexts; bottom plots---alternative contexts. Left plots---negative intervention: toward unbounded action. Right plots---positive intervention: toward bounded action. Flat lines show performance before intervention; dots---after intervention. Dashed lines---after random interventions.
  • Figure 5: t-SNE visualization of representations from BERT-large layer 24 for masked target verbs in non-alternative ( left 3 plots) and alternative ( right 3 plots) contexts using A. pretrained models, B. fine-tuned models, and C. models with fine-tuned last layers. Orange indicates imperfective, and blue---perfective.
  • ...and 9 more figures