Table of Contents
Fetching ...

Neural Architectures for Named Entity Recognition

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer

TL;DR

The paper tackles named entity recognition under limited supervision by introducing two neural architectures that avoid language-specific resources: a bidirectional LSTM with a CRF layer (LSTM-CRF) and a transition-based Stack-LSTM that builds labeled chunks. Both models integrate character-level word representations with pretrained embeddings and employ dropout to balance signals. They achieve state-of-the-art results on Dutch, German, and Spanish, and near top performance on English, all without gazetteers. The work demonstrates effective, language-agnostic NER by fusing orthographic and distributional information within end-to-end neural architectures.

Abstract

State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.

Neural Architectures for Named Entity Recognition

TL;DR

The paper tackles named entity recognition under limited supervision by introducing two neural architectures that avoid language-specific resources: a bidirectional LSTM with a CRF layer (LSTM-CRF) and a transition-based Stack-LSTM that builds labeled chunks. Both models integrate character-level word representations with pretrained embeddings and employ dropout to balance signals. They achieve state-of-the-art results on Dutch, German, and Spanish, and near top performance on English, all without gazetteers. The work demonstrates effective, language-agnostic NER by fusing orthographic and distributional information within end-to-end neural architectures.

Abstract

State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.

Paper Structure

This paper contains 20 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Main architecture of the network. Word embeddings are given to a bidirectional LSTM. $\mathbf{l}_i$ represents the word $i$ and its left context, $\mathbf{r}_i$ represents the word $i$ and its right context. Concatenating these two vectors yields a representation of the word $i$ in its context, $\mathbf{c}_i$.
  • Figure 2: Transitions of the Stack-LSTM model indicating the action applied and the resulting state. Bold symbols indicate (learned) embeddings of words and relations, script symbols indicate the corresponding words and relations.
  • Figure 3: Transition sequence for Mark Watney visited Mars with the Stack-LSTM model.
  • Figure 4: The character embeddings of the word "Mars" are given to a bidirectional LSTMs. We concatenate their last outputs to an embedding from a lookup table to obtain a representation for this word.