Table of Contents
Fetching ...

Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

Enora Rice, Ali Marashian, Hannah Haynie, Katharina von der Wense, Alexis Palmer

TL;DR

This work investigates how typology, data characteristics, and model architecture shape transfer-language selection for zero-shot cross-lingual POS tagging. It builds a ranking framework using gradient-boosted trees trained on features from typological databases (URIEL and Grambank) and corpus statistics, evaluating both bilingual biLSTMs and pretrained MLMs (XLM-R, M-BERT). The key finding is that both typological and dataset-dependent features independently contribute to transfer-language rankings, with the best performance achieved by combining them; among fine-grained typologies, Grambank generally provides stronger signals than URIEL, and features like word overlap, type-token ratio, and genealogical distance are consistently informative. These results offer interpretable guidance for selecting source languages to improve zero-shot cross-lingual POS tagging, particularly for under-resourced languages, and highlight architecture-specific patterns in transfer efficiency.

Abstract

Cross-lingual transfer learning is an invaluable tool for overcoming data scarcity, yet selecting a suitable transfer language remains a challenge. The precise roles of linguistic typology, training data, and model architecture in transfer language choice are not fully understood. We take a holistic approach, examining how both dataset-specific and fine-grained typological features influence transfer language selection for part-of-speech tagging, considering two different sources for morphosyntactic features. While previous work examines these dynamics in the context of bilingual biLSTMS, we extend our analysis to a more modern transfer learning pipeline: zero-shot prediction with pretrained multilingual models. We train a series of transfer language ranking systems and examine how different feature inputs influence ranker performance across architectures. Word overlap, type-token ratio, and genealogical distance emerge as top features across all architectures. Our findings reveal that a combination of typological and dataset-dependent features leads to the best rankings, and that good performance can be obtained with either feature group on its own.

Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

TL;DR

This work investigates how typology, data characteristics, and model architecture shape transfer-language selection for zero-shot cross-lingual POS tagging. It builds a ranking framework using gradient-boosted trees trained on features from typological databases (URIEL and Grambank) and corpus statistics, evaluating both bilingual biLSTMs and pretrained MLMs (XLM-R, M-BERT). The key finding is that both typological and dataset-dependent features independently contribute to transfer-language rankings, with the best performance achieved by combining them; among fine-grained typologies, Grambank generally provides stronger signals than URIEL, and features like word overlap, type-token ratio, and genealogical distance are consistently informative. These results offer interpretable guidance for selecting source languages to improve zero-shot cross-lingual POS tagging, particularly for under-resourced languages, and highlight architecture-specific patterns in transfer efficiency.

Abstract

Cross-lingual transfer learning is an invaluable tool for overcoming data scarcity, yet selecting a suitable transfer language remains a challenge. The precise roles of linguistic typology, training data, and model architecture in transfer language choice are not fully understood. We take a holistic approach, examining how both dataset-specific and fine-grained typological features influence transfer language selection for part-of-speech tagging, considering two different sources for morphosyntactic features. While previous work examines these dynamics in the context of bilingual biLSTMS, we extend our analysis to a more modern transfer learning pipeline: zero-shot prediction with pretrained multilingual models. We train a series of transfer language ranking systems and examine how different feature inputs influence ranker performance across architectures. Word overlap, type-token ratio, and genealogical distance emerge as top features across all architectures. Our findings reveal that a combination of typological and dataset-dependent features leads to the best rankings, and that good performance can be obtained with either feature group on its own.

Paper Structure

This paper contains 25 sections, 2 equations, 8 tables.