Table of Contents
Fetching ...

Named Entity Recognition with Bidirectional LSTM-CNNs

Jason P. C. Chiu, Eric Nichols

TL;DR

This paper introduces a hybrid Bidirectional LSTM–CNN model for Named Entity Recognition that automatically learns word- and character-level features, reducing feature engineering. It further proposes a novel partial lexicon matching and BIOES encoding scheme to leverage public lexicons (SENNA and DBpedia) alongside token embeddings, capitalization, and character information. Trained with a CRF-like objective and decoded by Viterbi, the model achieves state-of-the-art results on OntoNotes 5.0 and strong performance on CoNLL-2003 using only public embeddings. The findings demonstrate that large-scale neural architectures can learn rich linguistic features while effectively incorporating external lexical knowledge, with importance placed on domain-specific embeddings and robust lexicon encoding. Future work points to more advanced lexicon construction and domain adaptation to further improve NER performance.

Abstract

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. We also propose a novel method of encoding partial lexicon matches in neural networks and compare it to existing approaches. Extensive evaluation shows that, given only tokenized text and publicly available word embeddings, our system is competitive on the CoNLL-2003 dataset and surpasses the previously reported state of the art performance on the OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructed from publicly-available sources, we establish new state of the art performance with an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassing systems that employ heavy feature engineering, proprietary lexicons, and rich entity linking information.

Named Entity Recognition with Bidirectional LSTM-CNNs

TL;DR

This paper introduces a hybrid Bidirectional LSTM–CNN model for Named Entity Recognition that automatically learns word- and character-level features, reducing feature engineering. It further proposes a novel partial lexicon matching and BIOES encoding scheme to leverage public lexicons (SENNA and DBpedia) alongside token embeddings, capitalization, and character information. Trained with a CRF-like objective and decoded by Viterbi, the model achieves state-of-the-art results on OntoNotes 5.0 and strong performance on CoNLL-2003 using only public embeddings. The findings demonstrate that large-scale neural architectures can learn rich linguistic features while effectively incorporating external lexical knowledge, with importance placed on domain-specific embeddings and robust lexicon encoding. Future work points to more advanced lexicon construction and domain adaptation to further improve NER performance.

Abstract

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. We also propose a novel method of encoding partial lexicon matches in neural networks and compare it to existing approaches. Extensive evaluation shows that, given only tokenized text and publicly available word embeddings, our system is competitive on the CoNLL-2003 dataset and surpasses the previously reported state of the art performance on the OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructed from publicly-available sources, we establish new state of the art performance with an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassing systems that employ heavy feature engineering, proprietary lexicons, and rich entity linking information.

Paper Structure

This paper contains 34 sections, 2 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: The (unrolled) BLSTM for tagging named entities. Multiple tables look up word-level feature vectors. The CNN (Figure \ref{['fig:network-cnn']}) extracts a fixed length feature vector from character-level features. For each word, these vectors are concatenated and fed to the BLSTM network and then to the output layers (Figure \ref{['fig:network-out']}).
  • Figure 2: The convolutional neural network extracts character features from each word. The character embedding and (optionally) the character type feature vector are computed through lookup tables. Then, they are concatenated and passed into the CNN.
  • Figure 3: The output layers ("Out" in Figure \ref{['fig:network-main']}) decode output into a score for each tag category.
  • Figure 4: Example of how lexicon features are applied. The B, I, E, markings indicate that the token matches the Begin, Inside, and End token of an entry in the lexicon. S indicates that the token matches a single-token entry.
  • Figure 5: Fraction of named entities of each tag category matched completely by entries in each lexicon category of the SENNA/DBpedia combined lexicon. White = higher fraction.