Few-shot Name Entity Recognition on StackOverflow
Xinwei Chen, Kun Li, Tianyou Song, Jiangjian Guo
TL;DR
This paper tackles few-shot, fine-grained NER in the software domain using StackOverflow data. It introduces RoBERTa+MAML, combining a RoBERTa encoder with model-agnostic meta-learning to enable rapid adaptation to 27 entity types with limited labels; it also explores a prompt-based fine-tuning baseline and knowledge-based pattern extraction. Empirical results show a significant improvement in Micro-F1 (about 5 percentage points) over a RoBERTa baseline, and case studies demonstrate gains from data processing and pattern-driven enhancements for certain categories. The work suggests that meta-learning, domain-specific phrase processing, and knowledge-based rules can meaningfully improve domain-specific NER for software information retrieval and QA tasks.
Abstract
StackOverflow, with its vast question repository and limited labeled examples, raise an annotation challenge for us. We address this gap by proposing RoBERTa+MAML, a few-shot named entity recognition (NER) method leveraging meta-learning. Our approach, evaluated on the StackOverflow NER corpus (27 entity types), achieves a 5% F1 score improvement over the baseline. We improved the results further domain-specific phrase processing enhance results.
