Table of Contents
Fetching ...

Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning

Fred Philippy, Siwen Guo, Cedric Lothritz, Jacques Klein, Tegawendé F. Bissyandé

TL;DR

RoSPrompt addresses cross-lingual zero-shot classification with a data-efficient soft-prompt training regime that blends Soft Prompt Tuning and Nonparametric Prompting. It introduces multilingual verbalizers, contrastive label smoothing, and a penalty term to improve generalization under data distribution shifts, while keeping fine-tuning lightweight on small multilingual LMs. Across English training data and 106 languages, RoSPrompt achieves robust cross-lingual transfer and better unseen-class generalization, outperforming several baselines and maintaining efficiency. The approach offers a practical, scalable solution for cross-lingual topic classification and highlights potential extensions to broader multilingual NLP tasks.

Abstract

In NLP, Zero-Shot Classification (ZSC) has become essential for enabling models to classify text into categories unseen during training, particularly in low-resource languages and domains where labeled data is scarce. While pretrained language models (PLMs) have shown promise in ZSC, they often rely on large training datasets or external knowledge, limiting their applicability in multilingual and low-resource scenarios. Recent approaches leveraging natural language prompts reduce the dependence on large training datasets but struggle to effectively incorporate available labeled data from related classification tasks, especially when these datasets originate from different languages or distributions. Moreover, existing prompt-based methods typically rely on manually crafted prompts in a specific language, limiting their adaptability and effectiveness in cross-lingual settings. To address these challenges, we introduce RoSPrompt, a lightweight and data-efficient approach for training soft prompts that enhance cross-lingual ZSC while ensuring robust generalization across data distribution shifts. RoSPrompt is designed for small multilingual PLMs, enabling them to leverage high-resource languages to improve performance in low-resource settings without requiring extensive fine-tuning or high computational costs. We evaluate our approach on multiple multilingual PLMs across datasets covering 106 languages, demonstrating strong cross-lingual transfer performance and robust generalization capabilities over unseen classes.

Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning

TL;DR

RoSPrompt addresses cross-lingual zero-shot classification with a data-efficient soft-prompt training regime that blends Soft Prompt Tuning and Nonparametric Prompting. It introduces multilingual verbalizers, contrastive label smoothing, and a penalty term to improve generalization under data distribution shifts, while keeping fine-tuning lightweight on small multilingual LMs. Across English training data and 106 languages, RoSPrompt achieves robust cross-lingual transfer and better unseen-class generalization, outperforming several baselines and maintaining efficiency. The approach offers a practical, scalable solution for cross-lingual topic classification and highlights potential extensions to broader multilingual NLP tasks.

Abstract

In NLP, Zero-Shot Classification (ZSC) has become essential for enabling models to classify text into categories unseen during training, particularly in low-resource languages and domains where labeled data is scarce. While pretrained language models (PLMs) have shown promise in ZSC, they often rely on large training datasets or external knowledge, limiting their applicability in multilingual and low-resource scenarios. Recent approaches leveraging natural language prompts reduce the dependence on large training datasets but struggle to effectively incorporate available labeled data from related classification tasks, especially when these datasets originate from different languages or distributions. Moreover, existing prompt-based methods typically rely on manually crafted prompts in a specific language, limiting their adaptability and effectiveness in cross-lingual settings. To address these challenges, we introduce RoSPrompt, a lightweight and data-efficient approach for training soft prompts that enhance cross-lingual ZSC while ensuring robust generalization across data distribution shifts. RoSPrompt is designed for small multilingual PLMs, enabling them to leverage high-resource languages to improve performance in low-resource settings without requiring extensive fine-tuning or high computational costs. We evaluate our approach on multiple multilingual PLMs across datasets covering 106 languages, demonstrating strong cross-lingual transfer performance and robust generalization capabilities over unseen classes.

Paper Structure

This paper contains 30 sections, 12 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Conventional SPT lester_power_2021, while effective in leveraging existing data, requires distinct training for each topic classification task. Conversely, NPPrompt zhao_pre-trained_2023 offers versatility with a single natural language prompt for various tasks but lacks data leverage. Our method combines the strengths of both methods, enabling data utilization with a single soft prompt applicable across diverse topic classification tasks, while effectively overcoming the drawbacks of both methods.
  • Figure 2: Visual representation of RoSPrompt. During training, each class is categorized by a multilingual set of label tokens (①). We apply contrastive label smoothing (②) to the probability distribution across the entire vocabulary. To further deter overfitting, we integrate a custom penalty (③) into the loss function. During inference, we retrieve the logits predicted by the model and use the aggregation technique proposed by zhao_pre-trained_2023 to make the final prediction.
  • Figure 3: Average performance (accuracy) of RoSPrompt across 10 languages on SIB-200 for different values of $\epsilon$ and $\alpha$.
  • Figure 4: The prompt used for the Zero-Shot LLMs baseline with Llama3.1-8B and Phi-3.5-mini.