Table of Contents
Fetching ...

Few-shot Name Entity Recognition on StackOverflow

Xinwei Chen, Kun Li, Tianyou Song, Jiangjian Guo

TL;DR

This paper tackles few-shot, fine-grained NER in the software domain using StackOverflow data. It introduces RoBERTa+MAML, combining a RoBERTa encoder with model-agnostic meta-learning to enable rapid adaptation to 27 entity types with limited labels; it also explores a prompt-based fine-tuning baseline and knowledge-based pattern extraction. Empirical results show a significant improvement in Micro-F1 (about 5 percentage points) over a RoBERTa baseline, and case studies demonstrate gains from data processing and pattern-driven enhancements for certain categories. The work suggests that meta-learning, domain-specific phrase processing, and knowledge-based rules can meaningfully improve domain-specific NER for software information retrieval and QA tasks.

Abstract

StackOverflow, with its vast question repository and limited labeled examples, raise an annotation challenge for us. We address this gap by proposing RoBERTa+MAML, a few-shot named entity recognition (NER) method leveraging meta-learning. Our approach, evaluated on the StackOverflow NER corpus (27 entity types), achieves a 5% F1 score improvement over the baseline. We improved the results further domain-specific phrase processing enhance results.

Few-shot Name Entity Recognition on StackOverflow

TL;DR

This paper tackles few-shot, fine-grained NER in the software domain using StackOverflow data. It introduces RoBERTa+MAML, combining a RoBERTa encoder with model-agnostic meta-learning to enable rapid adaptation to 27 entity types with limited labels; it also explores a prompt-based fine-tuning baseline and knowledge-based pattern extraction. Empirical results show a significant improvement in Micro-F1 (about 5 percentage points) over a RoBERTa baseline, and case studies demonstrate gains from data processing and pattern-driven enhancements for certain categories. The work suggests that meta-learning, domain-specific phrase processing, and knowledge-based rules can meaningfully improve domain-specific NER for software information retrieval and QA tasks.

Abstract

StackOverflow, with its vast question repository and limited labeled examples, raise an annotation challenge for us. We address this gap by proposing RoBERTa+MAML, a few-shot named entity recognition (NER) method leveraging meta-learning. Our approach, evaluated on the StackOverflow NER corpus (27 entity types), achieves a 5% F1 score improvement over the baseline. We improved the results further domain-specific phrase processing enhance results.
Paper Structure (17 sections, 9 equations, 6 figures, 5 tables)

This paper contains 17 sections, 9 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Prompt based tuning
  • Figure 2: Meta Training
  • Figure 3: Meta Testing
  • Figure 4: Different categories F1 Score
  • Figure 5: Example between different training data
  • ...and 1 more figures