Efficient Uncertainty Estimation for LLM-based Entity Linking in Tabular Data
Carlo Bono, Federico Belotti, Matteo Palmonari
TL;DR
We address the challenge of obtaining reliable uncertainty estimates for LLM-based entity linking in tabular data without incurring the high cost of multi-shot inference. The authors introduce a self-supervised uncertainty regressor h_phi that predicts multi-shot uncertainty from single-shot, token-level observables, enabling near-equivalent uncertainty signals at a fraction of the cost. Across multiple instruction-tuned LLMs and a rich candidate set, the approach effectively identifies low-accuracy outputs and enables uncertainty-guided correction with substantial budget savings, while requiring only a modest warm-up phase. The method is model-agnostic, leverages token-probability signals, and offers a practical path to integrating uncertainty into EL pipelines for scalable, reliable data enrichment.
Abstract
Linking textual values in tabular data to their corresponding entities in a Knowledge Base is a core task across a variety of data integration and enrichment applications. Although Large Language Models (LLMs) have shown State-of-The-Art performance in Entity Linking (EL) tasks, their deployment in real-world scenarios requires not only accurate predictions but also reliable uncertainty estimates, which require resource-demanding multi-shot inference, posing serious limits to their actual applicability. As a more efficient alternative, we investigate a self-supervised approach for estimating uncertainty from single-shot LLM outputs using token-level features, reducing the need for multiple generations. Evaluation is performed on an EL task on tabular data across multiple LLMs, showing that the resulting uncertainty estimates are highly effective in detecting low-accuracy outputs. This is achieved at a fraction of the computational cost, ultimately supporting a cost-effective integration of uncertainty measures into LLM-based EL workflows. The method offers a practical way to incorporate uncertainty estimation into EL workflows with limited computational overhead.
