Manual Verbalizer Enrichment for Few-Shot Text Classification

Quang Anh Nguyen; Nadi Tomeh; Mustapha Lebbah; Thierry Charnois; Hanene Azzag; Santiago Cordoba Muñoz

Manual Verbalizer Enrichment for Few-Shot Text Classification

Quang Anh Nguyen, Nadi Tomeh, Mustapha Lebbah, Thierry Charnois, Hanene Azzag, Santiago Cordoba Muñoz

TL;DR

The paper addresses few-shot text classification under prompt-based learning by focusing on verbalizers, the bridge between masked LM outputs and class labels. It introduces MAVE, which enlarges manual verbalizers by incorporating semantically related words drawn from nearest neighbors in the model's embedding space, with class-wise aggregation and weights learned during fine-tuning; predictions are computed via a weighted average of label-word logits, and ensemble templates further stabilize results: $p(y|x) \propto \exp\left( \frac{\sum_{w \in \hat{v}(y)} q^y_w \mathcal{M}(w|T(x))}{\sum_{w \in \hat{v}(y)} q^y_w } \right)$ with $\hat{v}(y) = \bigcup_{w_0 \in v(y)} \mathcal{N}_k(w_0)$. Empirical evaluation across English and French datasets shows that MAVE achieves state-of-the-art performance in extremely low-data regimes (e.g., $N=32$) and remains competitive with larger instruction-tuned LLMs, highlighting the value of embedding-space-informed verbalizers and ensemble prompting for practical, resource-efficient NLP. The findings suggest that carefully constructed verbalizers, when enriched with neighborhood information, can rival or exceed larger models while consuming substantially fewer resources. Overall, the work emphasizes the significance of label-word semantics and template robustness in prompt-based few-shot learning and motivates future extensions to decoder-based and multilingual LMs.

Abstract

With the continuous development of pre-trained language models, prompt-based training becomes a well-adopted paradigm that drastically improves the exploitation of models for many natural language processing tasks. Prompting also shows great performance compared to traditional fine-tuning when adapted to zero-shot or few-shot scenarios where the number of annotated data is limited. In this framework, the role of verbalizers is essential, as an interpretation from masked word distributions into output predictions. In this work, we propose \acrshort{mave}, an approach for verbalizer construction by enrichment of class labels using neighborhood relation in the embedding space of words for the text classification task. In addition, we elaborate a benchmarking procedure to evaluate typical baselines of verbalizers for document classification in few-shot learning contexts. Our model achieves state-of-the-art results while using significantly fewer resources. We show that our approach is particularly effective in cases with extremely limited supervision data.

Manual Verbalizer Enrichment for Few-Shot Text Classification

TL;DR

with

. Empirical evaluation across English and French datasets shows that MAVE achieves state-of-the-art performance in extremely low-data regimes (e.g.,

) and remains competitive with larger instruction-tuned LLMs, highlighting the value of embedding-space-informed verbalizers and ensemble prompting for practical, resource-efficient NLP. The findings suggest that carefully constructed verbalizers, when enriched with neighborhood information, can rival or exceed larger models while consuming substantially fewer resources. Overall, the work emphasizes the significance of label-word semantics and template robustness in prompt-based few-shot learning and motivates future extensions to decoder-based and multilingual LMs.

Abstract

Paper Structure (31 sections, 10 equations, 5 figures, 6 tables)

This paper contains 31 sections, 10 equations, 5 figures, 6 tables.

Introduction
Related Works
Prompt-based fine-tuning
Enrichment of manual verbalizer
Methodology
Baselines
Manual
Soft
Auto
Instruction tuned LLM (Instruct)
Manual Verbalizer Enrichment by Nearest Neighbors' Embeddings
Experiments
Settings
Datasets and templates
AG
...and 16 more sections

Figures (5)

Figure 1: Accuracy of mave by number of label words, on four datasets for $N \in \{0, 64\}$. Dashed colored lines represent templates $T$ : 0, 1, 2, 3. Solid colored lines represent the ensemble methods: vote, proba, logit.
Figure 2: mave accuracy using different embedding spaces (LM, word2vec, GloVe) with varying data amount $N$.
Figure 3: Accuracy of models initialized with automatic verbalizers, with and without mave. Each point corresponds to one template under one random data split. All models are fine-tuned with $N=32$ examples.
Figure 4: Improvement with mave on logit-averaged models compared to their automatic initialization. All models are fine-tuned with $N=32$ examples.
Figure 5: Study of different sizes for the manual verbalizer on the frn dataset. title means using words in class names as label words.

Manual Verbalizer Enrichment for Few-Shot Text Classification

TL;DR

Abstract

Manual Verbalizer Enrichment for Few-Shot Text Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (5)