Table of Contents
Fetching ...

Real World Conversational Entity Linking Requires More Than Zeroshots

Mohanna Hoveyda, Arjen P. de Vries, Maarten de Rijke, Faegheh Hasibi

TL;DR

This work tackles the practical challenges of conversational entity linking by showing that zero-shot EL models struggle to generalize to unseen, domain-specific KBs ($G$ trained on Wikipedia vs. $G'$ on Fandom) and to operate effectively in dialogue. The authors introduce an evaluation framework and a Reddit-based zero-shot conversational dataset aligned with Fandom domains, alongside two zero-shot baselines (ELQ and BLINK) to quantify generalization and adaptability. Key findings reveal substantial performance drops when evaluating on unfamiliar KBs and in conversational settings, with errors in both mention detection and disambiguation; fine-tuning end-to-end ELQ on conversational data yields strong gains, underscoring the need for end-to-end, KB-aware approaches. The paper highlights a gap between existing zero-shot EL benchmarks and real-world conversational grounding and provides public datasets to spur progress toward robust, KB-constrained EL for dialogue.

Abstract

Entity linking (EL) in conversations faces notable challenges in practical applications, primarily due to the scarcity of entity-annotated conversational datasets and sparse knowledge bases (KB) containing domain-specific, long-tail entities. We designed targeted evaluation scenarios to measure the efficacy of EL models under resource constraints. Our evaluation employs two KBs: Fandom, exemplifying real-world EL complexities, and the widely used Wikipedia. First, we assess EL models' ability to generalize to a new unfamiliar KB using Fandom and a novel zero-shot conversational entity linking dataset that we curated based on Reddit discussions on Fandom entities. We then evaluate the adaptability of EL models to conversational settings without prior training. Our results indicate that current zero-shot EL models falter when introduced to new, domain-specific KBs without prior training, significantly dropping in performance. Our findings reveal that previous evaluation approaches fall short of capturing real-world complexities for zero-shot EL, highlighting the necessity for new approaches to design and assess conversational EL models to adapt to limited resources. The evaluation setup and the dataset proposed in this research are made publicly available.

Real World Conversational Entity Linking Requires More Than Zeroshots

TL;DR

This work tackles the practical challenges of conversational entity linking by showing that zero-shot EL models struggle to generalize to unseen, domain-specific KBs ( trained on Wikipedia vs. on Fandom) and to operate effectively in dialogue. The authors introduce an evaluation framework and a Reddit-based zero-shot conversational dataset aligned with Fandom domains, alongside two zero-shot baselines (ELQ and BLINK) to quantify generalization and adaptability. Key findings reveal substantial performance drops when evaluating on unfamiliar KBs and in conversational settings, with errors in both mention detection and disambiguation; fine-tuning end-to-end ELQ on conversational data yields strong gains, underscoring the need for end-to-end, KB-aware approaches. The paper highlights a gap between existing zero-shot EL benchmarks and real-world conversational grounding and provides public datasets to spur progress toward robust, KB-constrained EL for dialogue.

Abstract

Entity linking (EL) in conversations faces notable challenges in practical applications, primarily due to the scarcity of entity-annotated conversational datasets and sparse knowledge bases (KB) containing domain-specific, long-tail entities. We designed targeted evaluation scenarios to measure the efficacy of EL models under resource constraints. Our evaluation employs two KBs: Fandom, exemplifying real-world EL complexities, and the widely used Wikipedia. First, we assess EL models' ability to generalize to a new unfamiliar KB using Fandom and a novel zero-shot conversational entity linking dataset that we curated based on Reddit discussions on Fandom entities. We then evaluate the adaptability of EL models to conversational settings without prior training. Our results indicate that current zero-shot EL models falter when introduced to new, domain-specific KBs without prior training, significantly dropping in performance. Our findings reveal that previous evaluation approaches fall short of capturing real-world complexities for zero-shot EL, highlighting the necessity for new approaches to design and assess conversational EL models to adapt to limited resources. The evaluation setup and the dataset proposed in this research are made publicly available.
Paper Structure (17 sections, 1 equation, 5 tables)