Table of Contents
Fetching ...

Efficient Uncertainty Estimation for LLM-based Entity Linking in Tabular Data

Carlo Bono, Federico Belotti, Matteo Palmonari

TL;DR

We address the challenge of obtaining reliable uncertainty estimates for LLM-based entity linking in tabular data without incurring the high cost of multi-shot inference. The authors introduce a self-supervised uncertainty regressor h_phi that predicts multi-shot uncertainty from single-shot, token-level observables, enabling near-equivalent uncertainty signals at a fraction of the cost. Across multiple instruction-tuned LLMs and a rich candidate set, the approach effectively identifies low-accuracy outputs and enables uncertainty-guided correction with substantial budget savings, while requiring only a modest warm-up phase. The method is model-agnostic, leverages token-probability signals, and offers a practical path to integrating uncertainty into EL pipelines for scalable, reliable data enrichment.

Abstract

Linking textual values in tabular data to their corresponding entities in a Knowledge Base is a core task across a variety of data integration and enrichment applications. Although Large Language Models (LLMs) have shown State-of-The-Art performance in Entity Linking (EL) tasks, their deployment in real-world scenarios requires not only accurate predictions but also reliable uncertainty estimates, which require resource-demanding multi-shot inference, posing serious limits to their actual applicability. As a more efficient alternative, we investigate a self-supervised approach for estimating uncertainty from single-shot LLM outputs using token-level features, reducing the need for multiple generations. Evaluation is performed on an EL task on tabular data across multiple LLMs, showing that the resulting uncertainty estimates are highly effective in detecting low-accuracy outputs. This is achieved at a fraction of the computational cost, ultimately supporting a cost-effective integration of uncertainty measures into LLM-based EL workflows. The method offers a practical way to incorporate uncertainty estimation into EL workflows with limited computational overhead.

Efficient Uncertainty Estimation for LLM-based Entity Linking in Tabular Data

TL;DR

We address the challenge of obtaining reliable uncertainty estimates for LLM-based entity linking in tabular data without incurring the high cost of multi-shot inference. The authors introduce a self-supervised uncertainty regressor h_phi that predicts multi-shot uncertainty from single-shot, token-level observables, enabling near-equivalent uncertainty signals at a fraction of the cost. Across multiple instruction-tuned LLMs and a rich candidate set, the approach effectively identifies low-accuracy outputs and enables uncertainty-guided correction with substantial budget savings, while requiring only a modest warm-up phase. The method is model-agnostic, leverages token-probability signals, and offers a practical path to integrating uncertainty into EL pipelines for scalable, reliable data enrichment.

Abstract

Linking textual values in tabular data to their corresponding entities in a Knowledge Base is a core task across a variety of data integration and enrichment applications. Although Large Language Models (LLMs) have shown State-of-The-Art performance in Entity Linking (EL) tasks, their deployment in real-world scenarios requires not only accurate predictions but also reliable uncertainty estimates, which require resource-demanding multi-shot inference, posing serious limits to their actual applicability. As a more efficient alternative, we investigate a self-supervised approach for estimating uncertainty from single-shot LLM outputs using token-level features, reducing the need for multiple generations. Evaluation is performed on an EL task on tabular data across multiple LLMs, showing that the resulting uncertainty estimates are highly effective in detecting low-accuracy outputs. This is achieved at a fraction of the computational cost, ultimately supporting a cost-effective integration of uncertainty measures into LLM-based EL workflows. The method offers a practical way to incorporate uncertainty estimation into EL workflows with limited computational overhead.

Paper Structure

This paper contains 31 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: (left) Joint distribution of answer-level accuracy and Predictive Entropy (PE) per input for Llama-3.1-8B-Instruct, with marginal distributions on the axes. (right) We define the positive class as “flag for manual review”. Thresholding uncertainty yields: green = correctly flagged low-accuracy cases (true positives), yellow = incorrectly flagged high-accuracy cases (false positives), red = consistently wrong but low-uncertainty cases (false negatives; not recoverable by uncertainty thresholding).
  • Figure 2: ROC analysis targeting low-accuracy ($<0.5$) cases for selected models. Positive = “flag for manual review”. In the legend, the notation "Target(Segment, Observable)" indicates that the target variable Target was predicted using a regressor trained on Observable features extracted from the Segment portion of the prompt. Dashed lines referring to PE/SE Baseline (a posteriori) represent the multiple-generations PE and SE computed over $N=10$ generations.
  • Figure 3: Dataset accuracy as a function of the budget $B$ of items corrected, where ranking is based on the measures shown in the legend. The shaded areas correspond to the 95% C.I. estimated via 1000.0 bootstrap resampling iterations keeping1995introduction. Each curve illustrates how accuracy improves as more high-uncertainty prompts are corrected. In the legend, the notation "Target(Segment, Observable)" indicates that the target variable Target was predicted using a regressor trained on Observable features extracted from the Segment portion of the prompt. PE/SE Baseline (a posteriori) represent the multiple-generations PE and SE computed over $N=10$ generations.
  • Figure 4: Spearman correlation ($\rho$) with multiple-generations PE when training the proposed method over an increasing number of training cases, average over 10-fold cross-validation.
  • Figure 5: Spearman correlation ($\rho$) with PE, obtained with $N=10$ generations from TableLlama using all the tokens, as a function of the number of tokens (x-axis) and generations (y-axis) used to compute a truncated PE.
  • ...and 4 more figures