On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility
Kushal Tatariya, Wessel Poelman, Miryam de Lhoneux
TL;DR
This study questions whether English-centric positional encodings generalize to typologically diverse languages by training three encoding variants (no-pos, absolute, relative) of monolingual RoBERTa-base models across seven languages and evaluating them on four downstream tasks. Using gradient-based proxies for morphological complexity and word order flexibility, the authors find no robust, language-wide interaction between encoding type and typology; instead, performance depends heavily on the task and language, with relative encodings offering the most consistent results. The findings challenge the idea that morphological complexity and word order flexibility jointly modulate the usefulness of positional encodings and emphasize the importance of task- and language-aware encoding choices in multilingual NLP. The work also highlights limitations in proxy measures and suggests developing token-based, language-specific word order metrics for more reliable cross-language conclusions.
Abstract
Language model architectures are predominantly first created for English and subsequently applied to other languages. It is an open question whether this architectural bias leads to degraded performance for languages that are structurally different from English. We examine one specific architectural choice: positional encodings, through the lens of the trade-off hypothesis: the supposed interplay between morphological complexity and word order flexibility. This hypothesis posits a trade-off between the two: a more morphologically complex language can have a more flexible word order, and vice-versa. Positional encodings are a direct target to investigate the implications of this hypothesis in relation to language modelling. We pretrain monolingual model variants with absolute, relative, and no positional encodings for seven typologically diverse languages and evaluate them on four downstream tasks. Contrary to previous findings, we do not observe a clear interaction between position encodings and morphological complexity or word order flexibility, as measured by various proxies. Our results show that the choice of tasks, languages, and metrics are essential for drawing stable conclusions
