Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian
Serena Auriemma, Martina Miliani, Mauro Madeddu, Alessandro Bondielli, Lucia Passaro, Alessandro Lenci
TL;DR
This study investigates zero-shot classification in Italian Public Administration and legal texts using small, domain-adapted encoder LMs (BureauBERTo and Ita-Legal-BERT) guided by prompting. It systematically compares these models to a generic Italian model (UmBERTo) under various verbalizers (base, manual, knowledgeable) and calibration schemes (contextual and batch), evaluating on document-topic and entity-typing tasks and measuring linguistic competence via Pseudo-Log-Likelihood. Key findings show that domain-adapted models excel on domain-specific tasks when paired with appropriate verbalizers and calibrations, with batch calibration often delivering the strongest overall gains, while knowledgeable verbalizers can boost performance in certain setups. The work demonstrates that smaller, specialized encoders can effectively support domain-specific NLP in low-resource settings, offering practical pathways for digital transformation without relying on large LLMs. It also highlights PLLs as a complementary tool to assess domain linguistic competence and guide model selection and calibration strategies for niche languages and domains.
Abstract
Addressing the challenge of limited annotated data in specialized fields and low-resource languages is crucial for the effective use of Language Models (LMs). While most Large Language Models (LLMs) are trained on general-purpose English corpora, there is a notable gap in models specifically tailored for Italian, particularly for technical and bureaucratic jargon. This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in these specialized contexts. Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models. We evaluated the models on downstream tasks such as document classification and entity typing and conducted intrinsic evaluations using Pseudo-Log-Likelihood. The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting. Furthermore, the application of calibration techniques and in-domain verbalizers significantly enhances the efficacy of encoder models. These domain-specialized models prove to be particularly advantageous in scenarios where in-domain resources or expertise are scarce. In conclusion, our findings offer new insights into the use of Italian models in specialized contexts, which may have a significant impact on both research and industrial applications in the digital transformation era.
