Table of Contents
Fetching ...

Implicit Word Reordering with Knowledge Distillation for Cross-Lingual Dependency Parsing

Zhuoran Li, Chunming Hu, Junfan Chen, Zhijun Chen, Richong Zhang

TL;DR

The paper tackles cross-lingual dependency parsing where word-order differences hinder transfer. It introduces Implicit Word Reordering with Knowledge Distillation (IWR-KD), a teacher-student framework in which a target-language POS-based teacher guides a source-language parsing student to learn target-like word-order relations in the feature space without generating reordered sentences. Across 31 UD languages, IWR-KD outperforms strong baselines, demonstrating robust transfer especially when word-order distance is large, and ablation studies highlight the value of distillation over hard labels. The approach offers a efficient alternative to explicit reordering, with practical impact for multilingual parsing in low-resource settings.

Abstract

Word order difference between source and target languages is a major obstacle to cross-lingual transfer, especially in the dependency parsing task. Current works are mostly based on order-agnostic models or word reordering to mitigate this problem. However, such methods either do not leverage grammatical information naturally contained in word order or are computationally expensive as the permutation space grows exponentially with the sentence length. Moreover, the reordered source sentence with an unnatural word order may be a form of noising that harms the model learning. To this end, we propose an Implicit Word Reordering framework with Knowledge Distillation (IWR-KD). This framework is inspired by that deep networks are good at learning feature linearization corresponding to meaningful data transformation, e.g. word reordering. To realize this idea, we introduce a knowledge distillation framework composed of a word-reordering teacher model and a dependency parsing student model. We verify our proposed method on Universal Dependency Treebanks across 31 different languages and show it outperforms a series of competitors, together with experimental analysis to illustrate how our method works towards training a robust parser.

Implicit Word Reordering with Knowledge Distillation for Cross-Lingual Dependency Parsing

TL;DR

The paper tackles cross-lingual dependency parsing where word-order differences hinder transfer. It introduces Implicit Word Reordering with Knowledge Distillation (IWR-KD), a teacher-student framework in which a target-language POS-based teacher guides a source-language parsing student to learn target-like word-order relations in the feature space without generating reordered sentences. Across 31 UD languages, IWR-KD outperforms strong baselines, demonstrating robust transfer especially when word-order distance is large, and ablation studies highlight the value of distillation over hard labels. The approach offers a efficient alternative to explicit reordering, with practical impact for multilingual parsing in low-resource settings.

Abstract

Word order difference between source and target languages is a major obstacle to cross-lingual transfer, especially in the dependency parsing task. Current works are mostly based on order-agnostic models or word reordering to mitigate this problem. However, such methods either do not leverage grammatical information naturally contained in word order or are computationally expensive as the permutation space grows exponentially with the sentence length. Moreover, the reordered source sentence with an unnatural word order may be a form of noising that harms the model learning. To this end, we propose an Implicit Word Reordering framework with Knowledge Distillation (IWR-KD). This framework is inspired by that deep networks are good at learning feature linearization corresponding to meaningful data transformation, e.g. word reordering. To realize this idea, we introduce a knowledge distillation framework composed of a word-reordering teacher model and a dependency parsing student model. We verify our proposed method on Universal Dependency Treebanks across 31 different languages and show it outperforms a series of competitors, together with experimental analysis to illustrate how our method works towards training a robust parser.

Paper Structure

This paper contains 21 sections, 20 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison between different methods for Word Ordering Difference in cross-lingual dependency parsing. (a) Removing word order information. (b) Permuting the words in a source sentence to resemble the word order of a target language. (c) Our method adapts the word order in the feature space. Red arrows indicate reordering steps.
  • Figure 2: An example of an English sentence that is explicitly reordered to resemble to the Estonian syntactic order.
  • Figure 3: An overview of our IWR-KD: (i) Word Reordering Teacher decides the new direction between a dependent word and its head. (ii) Dependency Parsing Student is supervised by the teacher and the gold dependency parsing labels simultaneously.
  • Figure 4: Word order distance and performance. Languages (x-axis) are sorted by their order typology distances ahmad-etal-2019-difficulties to English from left to right.
  • Figure 5: Word order distance predicted by different word reordering teachers. The green bar indicates original word order distance between English and the target language.