Manual Verbalizer Enrichment for Few-Shot Text Classification
Quang Anh Nguyen, Nadi Tomeh, Mustapha Lebbah, Thierry Charnois, Hanene Azzag, Santiago Cordoba Muñoz
TL;DR
The paper addresses few-shot text classification under prompt-based learning by focusing on verbalizers, the bridge between masked LM outputs and class labels. It introduces MAVE, which enlarges manual verbalizers by incorporating semantically related words drawn from nearest neighbors in the model's embedding space, with class-wise aggregation and weights learned during fine-tuning; predictions are computed via a weighted average of label-word logits, and ensemble templates further stabilize results: $p(y|x) \propto \exp\left( \frac{\sum_{w \in \hat{v}(y)} q^y_w \mathcal{M}(w|T(x))}{\sum_{w \in \hat{v}(y)} q^y_w } \right)$ with $\hat{v}(y) = \bigcup_{w_0 \in v(y)} \mathcal{N}_k(w_0)$. Empirical evaluation across English and French datasets shows that MAVE achieves state-of-the-art performance in extremely low-data regimes (e.g., $N=32$) and remains competitive with larger instruction-tuned LLMs, highlighting the value of embedding-space-informed verbalizers and ensemble prompting for practical, resource-efficient NLP. The findings suggest that carefully constructed verbalizers, when enriched with neighborhood information, can rival or exceed larger models while consuming substantially fewer resources. Overall, the work emphasizes the significance of label-word semantics and template robustness in prompt-based few-shot learning and motivates future extensions to decoder-based and multilingual LMs.
Abstract
With the continuous development of pre-trained language models, prompt-based training becomes a well-adopted paradigm that drastically improves the exploitation of models for many natural language processing tasks. Prompting also shows great performance compared to traditional fine-tuning when adapted to zero-shot or few-shot scenarios where the number of annotated data is limited. In this framework, the role of verbalizers is essential, as an interpretation from masked word distributions into output predictions. In this work, we propose \acrshort{mave}, an approach for verbalizer construction by enrichment of class labels using neighborhood relation in the embedding space of words for the text classification task. In addition, we elaborate a benchmarking procedure to evaluate typical baselines of verbalizers for document classification in few-shot learning contexts. Our model achieves state-of-the-art results while using significantly fewer resources. We show that our approach is particularly effective in cases with extremely limited supervision data.
