Table of Contents
Fetching ...

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, Alessandro Moschitti

TL;DR

This work tackles the lack of labeled AS2 data in low-resource languages by proposing Cross-Lingual Knowledge Distillation (CLKD), where a strong English AS2 teacher guides a target-language student using soft labels derived from translations. It introduces Xtr-WikiQA and TyDi-AS2 to evaluate CLKD under translation-based and original-language data scenarios, respectively. Across a broad set of teachers, students, and training regimes, CLKD consistently rivals or surpasses supervised finetuning with the same amount of labeled data, with notable gains when original target-language data are used, and even outperforming MT-based pipelines in some settings. The results suggest CLKD as a practical route to building high-quality AS2 systems for many languages, and the TyDi-AS2 dataset provides a valuable multilingual benchmark for future studies.

Abstract

While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

TL;DR

This work tackles the lack of labeled AS2 data in low-resource languages by proposing Cross-Lingual Knowledge Distillation (CLKD), where a strong English AS2 teacher guides a target-language student using soft labels derived from translations. It introduces Xtr-WikiQA and TyDi-AS2 to evaluate CLKD under translation-based and original-language data scenarios, respectively. Across a broad set of teachers, students, and training regimes, CLKD consistently rivals or surpasses supervised finetuning with the same amount of labeled data, with notable gains when original target-language data are used, and even outperforming MT-based pipelines in some settings. The results suggest CLKD as a practical route to building high-quality AS2 systems for many languages, and the TyDi-AS2 dataset provides a valuable multilingual benchmark for future studies.

Abstract

While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.
Paper Structure (29 sections, 1 equation, 1 figure, 9 tables)

This paper contains 29 sections, 1 equation, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Cross-Lingual Knowledge Distillation (CLKD) in two different scenarios: (Top) using unlabeled English AS2 dataset for target low-resource language lacking any data and (Bottom) using unlabeled original low-resource language AS2 dataset. CLKD enables student AS2 models to learn from English teacher AS2 models without human-annotated datasets.