Table of Contents
Fetching ...

Language Models are Few-shot Multilingual Learners

Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason Yosinski, Pascale Fung

TL;DR

This paper demonstrates that large pre-trained language models can perform few-shot, multilingual intent classification without gradient updates by using carefully crafted prompts in an in-context learning setup. It introduces a Maximum Confidence Prediction mechanism to select among multiple label options and shows that English-context prompts can transfer to non-English test inputs across MTOP and MultiNLU datasets. Results indicate that model size substantially boosts performance, with GPT variants generally outperforming T5 and cross-lingual predictions improving as prompts scale, though non-English outputs lag English somewhat. The findings highlight the practical potential of zero-shot and few-shot cross-lingual inference for low-resource languages and suggest directions for future work on broader language coverage and prompt optimization.

Abstract

General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples. Here, we evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages without any parameter updates. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones. Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art cross-lingual models.

Language Models are Few-shot Multilingual Learners

TL;DR

This paper demonstrates that large pre-trained language models can perform few-shot, multilingual intent classification without gradient updates by using carefully crafted prompts in an in-context learning setup. It introduces a Maximum Confidence Prediction mechanism to select among multiple label options and shows that English-context prompts can transfer to non-English test inputs across MTOP and MultiNLU datasets. Results indicate that model size substantially boosts performance, with GPT variants generally outperforming T5 and cross-lingual predictions improving as prompts scale, though non-English outputs lag English somewhat. The findings highlight the practical potential of zero-shot and few-shot cross-lingual inference for low-resource languages and suggest directions for future work on broader language coverage and prompt optimization.

Abstract

General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples. Here, we evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages without any parameter updates. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones. Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art cross-lingual models.

Paper Structure

This paper contains 30 sections, 3 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: The average accuracy vs. model size on English-Spanish Multilingual NLU dataset achieved by cross-lingual in-context learning using various GPT and T5 models. The shaded region represents the standard deviation of three runs. The all-shot results are taken from liu2020attention.
  • Figure 2: Example of the inference and query generation on the few-shot learning, where the source language and target language are German and English, respectively.
  • Figure 3: The results on German (de) MTOP dataset with GPT models.
  • Figure 4: The results on English (en) MTOP dataset with GPT models.
  • Figure 5: The results on Spanish (es) MTOP dataset with GPT models.
  • ...and 13 more figures