Table of Contents
Fetching ...

Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training

Nghia Trung Ngo, Thien Huu Nguyen

TL;DR

This study determines the correlation between single-transfer performance and a wide range of linguistic-based distances, and investigates the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes.

Abstract

The majority of previous researches addressing multi-lingual IE are limited to zero-shot cross-lingual single-transfer (one-to-one) setting, with high-resource languages predominantly as source training data. As a result, these works provide little understanding and benefit for the realistic goal of developing a multi-lingual IE system that can generalize to as many languages as possible. Our study aims to fill this gap by providing a detailed analysis on Cross-Lingual Multi-Transferability (many-to-many transfer learning), for the recent IE corpora that cover a diverse set of languages. Specifically, we first determine the correlation between single-transfer performance and a wide range of linguistic-based distances. From the obtained insights, a combined language distance metric can be developed that is not only highly correlated but also robust across different tasks and model scales. Next, we investigate the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes. Language clustering based on the newly defined distance can provide directions for achieving the optimal cost-performance trade-off in data (languages) selection problem. Finally, a relational-transfer setting is proposed to further incorporate multi-lingual unlabeled data based on adversarial training using the relation induced from the above linguistic distance.

Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training

TL;DR

This study determines the correlation between single-transfer performance and a wide range of linguistic-based distances, and investigates the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes.

Abstract

The majority of previous researches addressing multi-lingual IE are limited to zero-shot cross-lingual single-transfer (one-to-one) setting, with high-resource languages predominantly as source training data. As a result, these works provide little understanding and benefit for the realistic goal of developing a multi-lingual IE system that can generalize to as many languages as possible. Our study aims to fill this gap by providing a detailed analysis on Cross-Lingual Multi-Transferability (many-to-many transfer learning), for the recent IE corpora that cover a diverse set of languages. Specifically, we first determine the correlation between single-transfer performance and a wide range of linguistic-based distances. From the obtained insights, a combined language distance metric can be developed that is not only highly correlated but also robust across different tasks and model scales. Next, we investigate the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes. Language clustering based on the newly defined distance can provide directions for achieving the optimal cost-performance trade-off in data (languages) selection problem. Finally, a relational-transfer setting is proposed to further incorporate multi-lingual unlabeled data based on adversarial training using the relation induced from the above linguistic distance.

Paper Structure

This paper contains 17 sections, 14 figures, 3 tables.

Figures (14)

  • Figure 1: The pairwise Pearson correlation for all computed language distances.
  • Figure 2: Feature importance weights of the optimal combined metric for each (task, model scale) setting. Small, base, and large models are represented by the colors red, green, and blue, respectively.
  • Figure 3: The language-based average Pearson correlation scores of all computed linguistic distances (including the combined metric).
  • Figure 4: Language clustering results for languages in SMiLER (a and b) and MINION (c and d). The graphs on the right (b and d) are the same as the ones on the left, but with connected medoids indicated by the new red edges.
  • Figure 5: Detailed transfer performances for MINION task in ZSCL-S setting.
  • ...and 9 more figures