Table of Contents
Fetching ...

Transformer-based Named Entity Recognition with Combined Data Representation

Michał Marcińczuk

TL;DR

This study investigates data representation strategies, including single, merged, and context, which respectively use one sentence, multiple sentences, and sentences joined with attention to context per vector to improve model stability and adaptability.

Abstract

This study examines transformer-based models and their effectiveness in named entity recognition tasks. The study investigates data representation strategies, including single, merged, and context, which respectively use one sentence, multiple sentences, and sentences joined with attention to context per vector. Analysis shows that training models with a single strategy may lead to poor performance on different data representations. To address this limitation, the study proposes a combined training procedure that utilizes all three strategies to improve model stability and adaptability. The results of this approach are presented and discussed for four languages (English, Polish, Czech, and German) across various datasets, demonstrating the effectiveness of the combined strategy.

Transformer-based Named Entity Recognition with Combined Data Representation

TL;DR

This study investigates data representation strategies, including single, merged, and context, which respectively use one sentence, multiple sentences, and sentences joined with attention to context per vector to improve model stability and adaptability.

Abstract

This study examines transformer-based models and their effectiveness in named entity recognition tasks. The study investigates data representation strategies, including single, merged, and context, which respectively use one sentence, multiple sentences, and sentences joined with attention to context per vector. Analysis shows that training models with a single strategy may lead to poor performance on different data representations. To address this limitation, the study proposes a combined training procedure that utilizes all three strategies to improve model stability and adaptability. The results of this approach are presented and discussed for four languages (English, Polish, Czech, and German) across various datasets, demonstrating the effectiveness of the combined strategy.
Paper Structure (24 sections, 6 figures, 13 tables)

This paper contains 24 sections, 6 figures, 13 tables.

Figures (6)

  • Figure 1: The neural network architecture for named entity recognition.
  • Figure 2: Visualization of the single sentence data representation. The blue parts represent the subtokens that the model will process. Grey parts are paddings that fill the vector to the maximum sequence size.
  • Figure 3: Visualization of the merged data representation. The blue parts represent the subtokens that the model will process. Grey parts are paddings that fill the vector to the maximum sequence size.
  • Figure 4: Visualization of the context data representation. The blue parts represent the subtokens that the model will process. Grey parts are paddings that fill the vector to the maximum sequence size. The 25% is a sample size of the context width. Those parts are considered when calculating attention but are not passed through the classification layer.
  • Figure 5: Schema of named entity categories in the CNEC 2.0 dataset.
  • ...and 1 more figures