Table of Contents
Fetching ...

A fast and sound tagging method for discontinuous named-entity recognition

Caio Corro

TL;DR

This work introduces a novel tagging scheme for discontinuous named entity recognition based on an explicit description of the inner structure of discontinuous mentions that relies on a weighted finite state automaton for both marginal and maximum a posteriori inference.

Abstract

We introduce a novel tagging scheme for discontinuous named entity recognition based on an explicit description of the inner structure of discontinuous mentions. We rely on a weighted finite state automaton for both marginal and maximum a posteriori inference. As such, our method is sound in the sense that (1) well-formedness of predicted tag sequences is ensured via the automaton structure and (2) there is an unambiguous mapping between well-formed sequences of tags and (discontinuous) mentions. We evaluate our approach on three English datasets in the biomedical domain, and report comparable results to state-of-the-art while having a way simpler and faster model.

A fast and sound tagging method for discontinuous named-entity recognition

TL;DR

This work introduces a novel tagging scheme for discontinuous named entity recognition based on an explicit description of the inner structure of discontinuous mentions that relies on a weighted finite state automaton for both marginal and maximum a posteriori inference.

Abstract

We introduce a novel tagging scheme for discontinuous named entity recognition based on an explicit description of the inner structure of discontinuous mentions. We rely on a weighted finite state automaton for both marginal and maximum a posteriori inference. As such, our method is sound in the sense that (1) well-formedness of predicted tag sequences is ensured via the automaton structure and (2) there is an unambiguous mapping between well-formed sequences of tags and (discontinuous) mentions. We evaluate our approach on three English datasets in the biomedical domain, and report comparable results to state-of-the-art while having a way simpler and faster model.
Paper Structure (16 sections, 10 equations, 2 figures, 3 tables)

This paper contains 16 sections, 10 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: (Top) Sentence with its original annotation. It contains two continuous mentions ("Chronic fatigue" and "stiff knees") and three discontinuous mentions ("swollen knees", "swollen left elbows" and "stiff left elbows"). (Bottom) Sentence annotated with our two-layer representation and the associated tag sequence.
  • Figure 2: The grammar automaton we propose for discontinuous named-entity recognition.