ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition

Haris Riaz; Razvan-Gabriel Dumitru; Mihai Surdeanu

ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition

Haris Riaz, Razvan-Gabriel Dumitru, Mihai Surdeanu

TL;DR

ELLEN tackles named entity recognition under extremely light supervision by leveraging a lexicon of $10$ exemplars per class within a modular neuro-symbolic framework. It fuses encoder-based language models with structured linguistic rules across a three-stage self-training sieve, incorporating a fully unsupervised MLM heuristic, dynamic window filtering, one-sense-per-discourse propagation, global disambiguation rules, and confidence-based data selection. The approach yields strong results on CoNLL-2003 at $1\%$ supervision (e.g., $76.87$ F1), competitive performance at $5\%$ supervision, and robust zero-shot capabilities on WNUT-17, even rivaling certain LLM baselines while maintaining high efficiency. Overall, ELLEN demonstrates that a carefully designed, encoder-only, neuro-symbolic system can surpass many semi-supervised methods with far less supervision, though it relies on domain- and language-specific rules and acknowledges potential noise under full supervision.

Abstract

In this work, we revisit the problem of semi-supervised named entity recognition (NER) focusing on extremely light supervision, consisting of a lexicon containing only 10 examples per class. We introduce ELLEN, a simple, fully modular, neuro-symbolic method that blends fine-tuned language models with linguistic rules. These rules include insights such as ''One Sense Per Discourse'', using a Masked Language Model as an unsupervised NER, leveraging part-of-speech tags to identify and eliminate unlabeled entities as false negatives, and other intuitions about classifier confidence scores in local and global context. ELLEN achieves very strong performance on the CoNLL-2003 dataset when using the minimal supervision from the lexicon above. It also outperforms most existing (and considerably more complex) semi-supervised NER methods under the same supervision settings commonly used in the literature (i.e., 5% of the training data). Further, we evaluate our CoNLL-2003 model in a zero-shot scenario on WNUT-17 where we find that it outperforms GPT-3.5 and achieves comparable performance to GPT-4. In a zero-shot setting, ELLEN also achieves over 75% of the performance of a strong, fully supervised model trained on gold data. Our code is available at: https://github.com/hriaz17/ELLEN.

ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition

TL;DR

ELLEN tackles named entity recognition under extremely light supervision by leveraging a lexicon of

exemplars per class within a modular neuro-symbolic framework. It fuses encoder-based language models with structured linguistic rules across a three-stage self-training sieve, incorporating a fully unsupervised MLM heuristic, dynamic window filtering, one-sense-per-discourse propagation, global disambiguation rules, and confidence-based data selection. The approach yields strong results on CoNLL-2003 at

supervision (e.g.,

F1), competitive performance at

supervision, and robust zero-shot capabilities on WNUT-17, even rivaling certain LLM baselines while maintaining high efficiency. Overall, ELLEN demonstrates that a carefully designed, encoder-only, neuro-symbolic system can surpass many semi-supervised methods with far less supervision, though it relies on domain- and language-specific rules and acknowledges potential noise under full supervision.

Abstract

Paper Structure (24 sections, 3 equations, 1 figure, 13 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 1 figure, 13 tables, 1 algorithm.

Introduction
Related Works
Proposed Method
Unsupervised Entity Recognition Using A Masked Language Model (MLM)
Dynamic Window Filtering
Global Rules
One Sense Per Discourse
Confidence-Based Rules
Minimizing The Dependency On A Lexicon
ELLEN: Integrating Neural And Symbolic Components
Experimental Results
Data & Setup
Results Using 1% Labeled Data
Results Using 5% Labeled Data
Zero-Shot Evaluation
...and 9 more sections

Figures (1)

Figure 1: The proposed method illustrated. $D$ refers to a subset of the unlabeled data which is added back to the labeled data for retraining in the next iteration. OSPD refers to the "One Sense Per Discourse" rule; "Global rules" indicate the rules described in Section \ref{['subsec:global_rules']}. The colors used in the figure represent the decreasing quality of the generated annotations in the three stages, after the fine-tuning stage: green$\rightarrow$orange$\rightarrow$blue.

ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition

TL;DR

Abstract

ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (1)