Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection
Ayoub Hammal, Benno Uthayasooriyar, Caio Corro
TL;DR
This paper tackles few-shot named-entity recognition by formulating domain adaptation as a weakly supervised, constrained clustering problem. It extends $k$-means with label supervision, cluster-size (ratio) constraints on the O tag, and joint discriminative subspace selection to learn well-separated prototypes for entity types using both a small labeled support and a large unlabeled target data pool. The authors introduce a deterministic, parameter-free training pipeline with a robust initialization and an efficient E-step, including a generalized eigenvalue solution for subspace learning. Empirically, the method achieves state-of-the-art results on several English NER benchmarks in both tag-set extension and domain-transfer scenarios, highlighting the value of unlabeled data and discriminative projections for few-shot adaptation.
Abstract
Named-entity recognition (NER) is a task that typically requires large annotated datasets, which limits its applicability across domains with varying entity definitions. This paper addresses few-shot NER, aiming to transfer knowledge to new domains with minimal supervision. Unlike previous approaches that rely solely on limited annotated data, we propose a weakly supervised algorithm that combines small labeled datasets with large amounts of unlabeled data. Our method extends the k-means algorithm with label supervision, cluster size constraints and domain-specific discriminative subspace selection. This unified framework achieves state-of-the-art results in few-shot NER on several English datasets.
