CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class Classification
Yang Li, Canran Xu, Guodong Long, Tao Shen, Chongyang Tao, Jing Jiang
TL;DR
This paper tackles verbalizer ambiguity that arises when prefix-tuning is applied to many-class classification. It introduces CCPrefix, a method that builds instance-dependent soft prefixes from fact-counterfactual label pairs, aligns these prefixes with global prototypes, and employs a Siamese training objective to stabilize learning. Across relation classification, topic classification, and entity typing, CCPrefix achieves state-of-the-art or strong gains in both fully supervised and few-shot settings, outperforming PTR, ProtoVerb, and PETAL baselines. The work reduces reliance on manually crafted prompts and label words, improving robustness to large label spaces and extending applicability to evolving language models while offering practical benefits for real-world NLP tasks.
Abstract
Recently, prefix-tuning was proposed to efficiently adapt pre-trained language models to a broad spectrum of natural language classification tasks. It leverages soft prefix as task-specific indicators and language verbalizers as categorical-label mentions to narrow the formulation gap from pre-training language models. However, when the label space increases considerably (i.e., many-class classification), such a tuning technique suffers from a verbalizer ambiguity problem since the many-class labels are represented by semantic-similar verbalizers in short language phrases. To overcome this, inspired by the human-decision process that the most ambiguous classes would be mulled over for each instance, we propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix), for many-class classification. Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification. We conduct experiments on many-class benchmark datasets in both the fully supervised setting and the few-shot setting, which indicates that our model outperforms former baselines.
