Table of Contents
Fetching ...

On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility

Kushal Tatariya, Wessel Poelman, Miryam de Lhoneux

TL;DR

This study questions whether English-centric positional encodings generalize to typologically diverse languages by training three encoding variants (no-pos, absolute, relative) of monolingual RoBERTa-base models across seven languages and evaluating them on four downstream tasks. Using gradient-based proxies for morphological complexity and word order flexibility, the authors find no robust, language-wide interaction between encoding type and typology; instead, performance depends heavily on the task and language, with relative encodings offering the most consistent results. The findings challenge the idea that morphological complexity and word order flexibility jointly modulate the usefulness of positional encodings and emphasize the importance of task- and language-aware encoding choices in multilingual NLP. The work also highlights limitations in proxy measures and suggests developing token-based, language-specific word order metrics for more reliable cross-language conclusions.

Abstract

Language model architectures are predominantly first created for English and subsequently applied to other languages. It is an open question whether this architectural bias leads to degraded performance for languages that are structurally different from English. We examine one specific architectural choice: positional encodings, through the lens of the trade-off hypothesis: the supposed interplay between morphological complexity and word order flexibility. This hypothesis posits a trade-off between the two: a more morphologically complex language can have a more flexible word order, and vice-versa. Positional encodings are a direct target to investigate the implications of this hypothesis in relation to language modelling. We pretrain monolingual model variants with absolute, relative, and no positional encodings for seven typologically diverse languages and evaluate them on four downstream tasks. Contrary to previous findings, we do not observe a clear interaction between position encodings and morphological complexity or word order flexibility, as measured by various proxies. Our results show that the choice of tasks, languages, and metrics are essential for drawing stable conclusions

On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility

TL;DR

This study questions whether English-centric positional encodings generalize to typologically diverse languages by training three encoding variants (no-pos, absolute, relative) of monolingual RoBERTa-base models across seven languages and evaluating them on four downstream tasks. Using gradient-based proxies for morphological complexity and word order flexibility, the authors find no robust, language-wide interaction between encoding type and typology; instead, performance depends heavily on the task and language, with relative encodings offering the most consistent results. The findings challenge the idea that morphological complexity and word order flexibility jointly modulate the usefulness of positional encodings and emphasize the importance of task- and language-aware encoding choices in multilingual NLP. The work also highlights limitations in proxy measures and suggests developing token-based, language-specific word order metrics for more reliable cross-language conclusions.

Abstract

Language model architectures are predominantly first created for English and subsequently applied to other languages. It is an open question whether this architectural bias leads to degraded performance for languages that are structurally different from English. We examine one specific architectural choice: positional encodings, through the lens of the trade-off hypothesis: the supposed interplay between morphological complexity and word order flexibility. This hypothesis posits a trade-off between the two: a more morphologically complex language can have a more flexible word order, and vice-versa. Positional encodings are a direct target to investigate the implications of this hypothesis in relation to language modelling. We pretrain monolingual model variants with absolute, relative, and no positional encodings for seven typologically diverse languages and evaluate them on four downstream tasks. Contrary to previous findings, we do not observe a clear interaction between position encodings and morphological complexity or word order flexibility, as measured by various proxies. Our results show that the choice of tasks, languages, and metrics are essential for drawing stable conclusions

Paper Structure

This paper contains 31 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: We investigate the relationship between positional encodings and morphology by training three model variants per language: no-pos, absolute, and relative. We use seven typologically diverse languages, varying in morphological complexity and word order flexibility.
  • Figure 2: Loss curves on validation set for pretraining for absolute, relative, and no-pos.
  • Figure 3: Results per task, language, and positional encoding type. Scores are averaged over 5 runs. Full results are in §\ref{['app:full-results']}.
  • Figure 4: Relation between proxies for morphological complexity and downstream performance. The line shows the groupings of positional encoding type.
  • Figure 5: Relation between entropic efficiency of the accessor variety and downstream performance.
  • ...and 4 more figures