Table of Contents
Fetching ...

A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition

Haojie Zhang, Yimeng Zhuang

TL;DR

The paper addresses the challenge of few-shot NER by combining label semantics with contrastive learning. It introduces a unified framework that appends natural-language label suffixes to context prompts and optimizes both context-context and context-label contrasts, using a projection head to map to Gaussian embeddings and enabling nearest-neighbor inference at test. Empirical results across OntoNotes, CoNLL'03, WNUT'17, GUM, I2B2, and FEW-NERD show state-of-the-art micro-F1 gains with strong transfer performance and robust contextual representations. Ablation and visualization analyses attribute the gains to improved discriminative context representations and effective use of label semantics, demonstrating the method’s versatility and potential for extension to other token-level tasks and zero-shot scenarios.

Abstract

Few-shot Named Entity Recognition (NER) aims to extract named entities using only a limited number of labeled examples. Existing contrastive learning methods often suffer from insufficient distinguishability in context vector representation because they either solely rely on label semantics or completely disregard them. To tackle this issue, we propose a unified label-aware token-level contrastive learning framework. Our approach enriches the context by utilizing label semantics as suffix prompts. Additionally, it simultaneously optimizes context-context and context-label contrastive learning objectives to enhance generalized discriminative contextual representations.Extensive experiments on various traditional test domains (OntoNotes, CoNLL'03, WNUT'17, GUM, I2B2) and the large-scale few-shot NER dataset (FEWNERD) demonstrate the effectiveness of our approach. It outperforms prior state-of-the-art models by a significant margin, achieving an average absolute gain of 7% in micro F1 scores across most scenarios. Further analysis reveals that our model benefits from its powerful transfer capability and improved contextual representations.

A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition

TL;DR

The paper addresses the challenge of few-shot NER by combining label semantics with contrastive learning. It introduces a unified framework that appends natural-language label suffixes to context prompts and optimizes both context-context and context-label contrasts, using a projection head to map to Gaussian embeddings and enabling nearest-neighbor inference at test. Empirical results across OntoNotes, CoNLL'03, WNUT'17, GUM, I2B2, and FEW-NERD show state-of-the-art micro-F1 gains with strong transfer performance and robust contextual representations. Ablation and visualization analyses attribute the gains to improved discriminative context representations and effective use of label semantics, demonstrating the method’s versatility and potential for extension to other token-level tasks and zero-shot scenarios.

Abstract

Few-shot Named Entity Recognition (NER) aims to extract named entities using only a limited number of labeled examples. Existing contrastive learning methods often suffer from insufficient distinguishability in context vector representation because they either solely rely on label semantics or completely disregard them. To tackle this issue, we propose a unified label-aware token-level contrastive learning framework. Our approach enriches the context by utilizing label semantics as suffix prompts. Additionally, it simultaneously optimizes context-context and context-label contrastive learning objectives to enhance generalized discriminative contextual representations.Extensive experiments on various traditional test domains (OntoNotes, CoNLL'03, WNUT'17, GUM, I2B2) and the large-scale few-shot NER dataset (FEWNERD) demonstrate the effectiveness of our approach. It outperforms prior state-of-the-art models by a significant margin, achieving an average absolute gain of 7% in micro F1 scores across most scenarios. Further analysis reveals that our model benefits from its powerful transfer capability and improved contextual representations.
Paper Structure (32 sections, 10 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 32 sections, 10 equations, 2 figures, 9 tables, 1 algorithm.

Figures (2)

  • Figure 1: An overview of the architecture of our proposed model. (a) During the training and fine-tuning process in the source domain, the fine-tuning follows a similar approach as training, but with a different label prompt. Utilizing contrastive learning, tokens belonging to the same entity types are attracted toward each other, while tokens representing different entity types are pushed apart. This encourages the model to learn a more distinct and effective representation of entity-specific information. The contrastive learning includes two aspects: context-context and context-label. (b) Inference process with nearest neighbor prediction. Similarity scores between query tokens and support tokens will be calculated according to the distance metric.
  • Figure 2: Two-dimensional t-SNE visualizations of the FEW-NERD test set. The token representations are from the sampled 6 fine-grained entity types of location category. The left is for CONTaiNER and the right is for ours.