Table of Contents
Fetching ...

Effectively Prompting Small-sized Language Models for Cross-lingual Tasks via Winning Tickets

Mingqi Li, Feng Luo

TL;DR

The paper addresses low-resource, cross-lingual transfer with small language models by introducing Lottery Ticket Prompt-learning (LTP), which combines the Lottery Ticket Hypothesis with soft prompting. It identifies a sparse set of backbone parameters most altered during English MLM, augments the model with soft prompts, and trains only these parameters plus the prompts, enabling efficient zero-shot and in-language cross-lingual adaptation. Across XNLI and AmericasNLI, LTP outperforms strong baselines using only a fraction of the backbone parameters (as low as 20%), with middle-layer tickets offering particularly high efficiency. The work advances practical multilingual NLP by reducing reliance on external linguistic resources and showing that carefully selected, language-agnostic subnetworks can drive cross-lingual transfer for low-resource languages.

Abstract

Current soft prompt methods yield limited performance when applied to small-sized models (fewer than a billion parameters). Deep prompt-tuning, which entails prepending parameters in each layer for enhanced efficacy, presents a solution for prompting small-sized models, albeit requiring carefully designed implementation. In this paper, we introduce the Lottery Ticket Prompt-learning (LTP) framework that integrates winning tickets with soft prompts. The LTP offers a simpler implementation and requires only a one-time execution. We demonstrate LTP on cross-lingual tasks, where prior works rely on external tools like human-designed multilingual templates and bilingual dictionaries, which may not be feasible in a low-resource regime. Specifically, we select a subset of parameters that have been changed the most during the fine-tuning with the Masked Language Modeling objective. Then, we prepend soft prompts to the original pre-trained language model and only update the selected parameters together with prompt-related parameters when adapting to the downstream tasks. We verify the effectiveness of our LTP framework on cross-lingual tasks, specifically targeting low-resource languages. Our approach outperforms the baselines by only updating 20\% of the original parameters.

Effectively Prompting Small-sized Language Models for Cross-lingual Tasks via Winning Tickets

TL;DR

The paper addresses low-resource, cross-lingual transfer with small language models by introducing Lottery Ticket Prompt-learning (LTP), which combines the Lottery Ticket Hypothesis with soft prompting. It identifies a sparse set of backbone parameters most altered during English MLM, augments the model with soft prompts, and trains only these parameters plus the prompts, enabling efficient zero-shot and in-language cross-lingual adaptation. Across XNLI and AmericasNLI, LTP outperforms strong baselines using only a fraction of the backbone parameters (as low as 20%), with middle-layer tickets offering particularly high efficiency. The work advances practical multilingual NLP by reducing reliance on external linguistic resources and showing that carefully selected, language-agnostic subnetworks can drive cross-lingual transfer for low-resource languages.

Abstract

Current soft prompt methods yield limited performance when applied to small-sized models (fewer than a billion parameters). Deep prompt-tuning, which entails prepending parameters in each layer for enhanced efficacy, presents a solution for prompting small-sized models, albeit requiring carefully designed implementation. In this paper, we introduce the Lottery Ticket Prompt-learning (LTP) framework that integrates winning tickets with soft prompts. The LTP offers a simpler implementation and requires only a one-time execution. We demonstrate LTP on cross-lingual tasks, where prior works rely on external tools like human-designed multilingual templates and bilingual dictionaries, which may not be feasible in a low-resource regime. Specifically, we select a subset of parameters that have been changed the most during the fine-tuning with the Masked Language Modeling objective. Then, we prepend soft prompts to the original pre-trained language model and only update the selected parameters together with prompt-related parameters when adapting to the downstream tasks. We verify the effectiveness of our LTP framework on cross-lingual tasks, specifically targeting low-resource languages. Our approach outperforms the baselines by only updating 20\% of the original parameters.
Paper Structure (20 sections, 3 figures, 3 tables)

This paper contains 20 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Lottery Ticket Prompt-learning framework: a) the parameter selection step is presented on the left, where frozen parameters are denoted in blue, while trainable parameters are highlighted in orange; b) the prompt-learning step is depicted on the right, where the generated binary masks keep certain parameters unchanged.
  • Figure 2: Detailed analysis of different strategies and active ratios. The performance is the average of 15 languages in the zero-shot cross-lingual transfer setting. In the left figure, we select 20% as the active ratio.
  • Figure 3: Parameter distributions of different strategies. 0 denotes the embedding layer, while 1-12 corresponds to Transformer layers 1-12.