Table of Contents
Fetching ...

Learning Dense Representations for Entity Retrieval

Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, Diego Garcia-Olano

TL;DR

This work tackles entity linking by removing the traditional alias-table and re-ranking pipeline in favor of a fully learned, single-stage retrieval model. It introduces DEER, a dual-encoder architecture that learns dense representations for mentions and entities in a shared space and uses nearest-neighbor search for candidate retrieval, enabling end-to-end retrieval without cross-attention between mention and entity representations. Trained on Wikipedia anchor-text with unsupervised hard negative mining, DEER achieves competitive or superior recall on TAC-KBP2010 and Wikinews while delivering substantial speed advantages, especially when combined with approximate nearest neighbor search. The results demonstrate the practicality of a dense, retrieval-only approach for large-scale knowledge bases and point to promising extensions with stronger encoders and cross-lingual data for broader applicability.

Abstract

We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search. Unlike prior work, this setup does not rely on an alias table followed by a re-ranker, and is thus the first fully learned entity retrieval model. We show that our dual encoder, trained using only anchor-text links in Wikipedia, outperforms discrete alias table and BM25 baselines, and is competitive with the best comparable results on the standard TACKBP-2010 dataset. In addition, it can retrieve candidates extremely fast, and generalizes well to a new dataset derived from Wikinews. On the modeling side, we demonstrate the dramatic value of an unsupervised negative mining algorithm for this task.

Learning Dense Representations for Entity Retrieval

TL;DR

This work tackles entity linking by removing the traditional alias-table and re-ranking pipeline in favor of a fully learned, single-stage retrieval model. It introduces DEER, a dual-encoder architecture that learns dense representations for mentions and entities in a shared space and uses nearest-neighbor search for candidate retrieval, enabling end-to-end retrieval without cross-attention between mention and entity representations. Trained on Wikipedia anchor-text with unsupervised hard negative mining, DEER achieves competitive or superior recall on TAC-KBP2010 and Wikinews while delivering substantial speed advantages, especially when combined with approximate nearest neighbor search. The results demonstrate the practicality of a dense, retrieval-only approach for large-scale knowledge bases and point to promising extensions with stronger encoders and cross-lingual data for broader applicability.

Abstract

We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search. Unlike prior work, this setup does not rely on an alias table followed by a re-ranker, and is thus the first fully learned entity retrieval model. We show that our dual encoder, trained using only anchor-text links in Wikipedia, outperforms discrete alias table and BM25 baselines, and is competitive with the best comparable results on the standard TACKBP-2010 dataset. In addition, it can retrieve candidates extremely fast, and generalizes well to a new dataset derived from Wikinews. On the modeling side, we demonstrate the dramatic value of an unsupervised negative mining algorithm for this task.

Paper Structure

This paper contains 19 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Architecture of the dual encoder model for retrieval (a). Common component architectures are shown for (b) text input, (c) sparse ID input, and (d) compound input joining multiple encoder outputs. Note that all text encoders share a common set of embeddings.
  • Figure 2: Recall@1 improvement for successive iterations of hard negative mining for Wikinews (solid) and TACKBP-2010 (dashed).
  • Figure 3: A 2D projection of cities, bands, and people embeddings (using t-SNE), color coded by their category.