Table of Contents
Fetching ...

Uncertainty Estimation on Sequential Labeling via Uncertainty Transmission

Jianfeng He, Linlin Yu, Shuo Lei, Chang-Tien Lu, Feng Chen

TL;DR

This work tackles uncertainty estimation for sequential labeling in NER (UE-NER) by introducing the Sequential Labeling Posterior Network (SLPN), which transmits uncertainty across tokens via a revised self-attention mechanism. Building on evidential deep learning, SLPN uses token-level Dirichlet posteriors ${\boldsymbol{\beta}^{\text{post}}}$ and an uncertainty transmission term ${\boldsymbol{\beta}^{\text{trans}}}$ to form ${\boldsymbol{\beta}^{\text{agg}} = \boldsymbol{\beta}^{\text{post}} + \boldsymbol{\beta}^{\text{trans}}}$, from which Dirichlet parameters ${\boldsymbol{\alpha}^{\text{agg}} = \boldsymbol{\beta}^{\text{agg}} + \boldsymbol{\beta}^{\text{prior}}}$ drive token-level uncertainty estimates. An evaluation framework separates OOD detection and wrong-span (WS) entity detection, enabling a robust assessment of UE-NER performance; experiments on MIT-Restaurant, Mov-Sim, and Mov-Com show SLPN achieves strong improvements in weighted OOD/WS metrics and essential gains from the transmitted uncertainty component, though WS detection remains challenging. The approach yields practical benefits for safety-critical information extraction by improving OOD detection without sacrificing NER accuracy, while also highlighting areas for future work such as WS-focused improvements and broader model generalization.

Abstract

Sequential labeling is a task predicting labels for each token in a sequence, such as Named Entity Recognition (NER). NER tasks aim to extract entities and predict their labels given a text, which is important in information extraction. Although previous works have shown great progress in improving NER performance, uncertainty estimation on NER (UE-NER) is still underexplored but essential. This work focuses on UE-NER, which aims to estimate uncertainty scores for the NER predictions. Previous uncertainty estimation models often overlook two unique characteristics of NER: the connection between entities (i.e., one entity embedding is learned based on the other ones) and wrong span cases in the entity extraction subtask. Therefore, we propose a Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores for the extracted entities, considering uncertainty transmitted from other tokens. Moreover, we have defined an evaluation strategy to address the specificity of wrong-span cases. Our SLPN has achieved significant improvements on three datasets, such as a 5.54-point improvement in AUPR on the MIT-Restaurant dataset. Our code is available at \url{https://github.com/he159ok/UncSeqLabeling_SLPN}.

Uncertainty Estimation on Sequential Labeling via Uncertainty Transmission

TL;DR

This work tackles uncertainty estimation for sequential labeling in NER (UE-NER) by introducing the Sequential Labeling Posterior Network (SLPN), which transmits uncertainty across tokens via a revised self-attention mechanism. Building on evidential deep learning, SLPN uses token-level Dirichlet posteriors and an uncertainty transmission term to form , from which Dirichlet parameters drive token-level uncertainty estimates. An evaluation framework separates OOD detection and wrong-span (WS) entity detection, enabling a robust assessment of UE-NER performance; experiments on MIT-Restaurant, Mov-Sim, and Mov-Com show SLPN achieves strong improvements in weighted OOD/WS metrics and essential gains from the transmitted uncertainty component, though WS detection remains challenging. The approach yields practical benefits for safety-critical information extraction by improving OOD detection without sacrificing NER accuracy, while also highlighting areas for future work such as WS-focused improvements and broader model generalization.

Abstract

Sequential labeling is a task predicting labels for each token in a sequence, such as Named Entity Recognition (NER). NER tasks aim to extract entities and predict their labels given a text, which is important in information extraction. Although previous works have shown great progress in improving NER performance, uncertainty estimation on NER (UE-NER) is still underexplored but essential. This work focuses on UE-NER, which aims to estimate uncertainty scores for the NER predictions. Previous uncertainty estimation models often overlook two unique characteristics of NER: the connection between entities (i.e., one entity embedding is learned based on the other ones) and wrong span cases in the entity extraction subtask. Therefore, we propose a Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores for the extracted entities, considering uncertainty transmitted from other tokens. Moreover, we have defined an evaluation strategy to address the specificity of wrong-span cases. Our SLPN has achieved significant improvements on three datasets, such as a 5.54-point improvement in AUPR on the MIT-Restaurant dataset. Our code is available at \url{https://github.com/he159ok/UncSeqLabeling_SLPN}.
Paper Structure (23 sections, 18 equations, 2 figures, 5 tables)

This paper contains 23 sections, 18 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: In this example, though the tokens "Samsung" and "Inc." both have the same uncertainty score of 0.4, the context in the right case exhibits higher uncertainty. This suggests that "Inc." should be considered more uncertain than "Samsung." Therefore, we propose transmitting the predicted uncertainty from other tokens to a given token.
  • Figure 2: (a) A diagram of our SLPN model illustrates how we achieve uncertainty transmission through a revised self-attention mechanism applied to all tokens. Specifically, the SLPN model begins by generating a text embedding matrix $\mathbf{X}$ with $l$ rows, corresponding to a text containing $l$ tokens. Next, an MLP model projects $\mathbf{X}$ into a latent embedding matrix $\mathbf{Z}$ also with $l$ rows. This $\mathbf{Z}$ matrix is used to compute $\boldsymbol{\beta}^{post,t}\in\mathbb{R}^{l \times c}$ through a normalizing flow (NF) operation. Each row of $\boldsymbol{\beta}^{post,t}$ represents the evidence count from the token's self-view, directly influencing the uncertainty of each token's prediction. In contrast to previous research, our approach includes the transmission of uncertainty from all tokens within the text to obtain the transmitted uncertainty $\boldsymbol{\beta}^{trans,t}$. Finally, we combine the sum of $\boldsymbol{\beta}^{post,t}$ and $\boldsymbol{\beta}^{trans,t}$ to generate the semantic matrix $\mathbf{\bar{p}^{agg}}\in\mathbb{R}^{l \times c}$, representing the semantics of the $l$ tokens. (b) Revised self-attention mechanism.