Table of Contents
Fetching ...

Embedded Named Entity Recognition using Probing Classifiers

Nicholas Popovič, Michael Färber

TL;DR

This work proposes an approach called EMBER which enables streaming named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time.

Abstract

Streaming text generation has become a common way of increasing the responsiveness of language model powered applications, such as chat assistants. At the same time, extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the language model. Instead, we propose an approach called EMBER which enables streaming named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline. We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models optimized for streaming text generation.

Embedded Named Entity Recognition using Probing Classifiers

TL;DR

This work proposes an approach called EMBER which enables streaming named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time.

Abstract

Streaming text generation has become a common way of increasing the responsiveness of language model powered applications, such as chat assistants. At the same time, extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the language model. Instead, we propose an approach called EMBER which enables streaming named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline. We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models optimized for streaming text generation.
Paper Structure (38 sections, 3 equations, 10 figures, 20 tables)

This paper contains 38 sections, 3 equations, 10 figures, 20 tables.

Figures (10)

  • Figure 1: EMBER enables simultaneous text generation and entity annotation by using a language model's internal representations as the feature space for classification. Compared to using state-of-the-art NER models, this results in a substantially more efficient pipeline allowing for streaming named entity recognition. Parameter and latency comparisons stated in this figure are based on the experiments conducted using GPT-2$_{\text{XL}}$, presented in section \ref{['sec:simgenNER']}.
  • Figure 2: Illustration of the proposed approach for named entity recognition using probing classifiers. Black squares symbolize individual transformer layers at individual timesteps, while dotted lines symbolize information flow throughout the transformer. Probing classifiers are shown in red, with circles symbolizing where representations are accessed. One classifier performs token-level entity typing using hidden states at a single layer, while second classifier detects spans based on attention weights. Both predictions are aggregated into span-level entity predictions.
  • Figure 3: Illustration of the different span detection methods. Red colors indicate which attention weights to classify as positive for the example span "New York Film Festival". Attention weights are only shown for a single layer, but are generally used at all layers.
  • Figure 4: Entity typing F1 scores (validation set) for models with respect to hidden state dimension.
  • Figure 5: Mention detection F1 scores (validation set) for models with respect to the total number of attention heads.
  • ...and 5 more figures