DistillER: Knowledge Distillation in Entity Resolution with Large Language Models
Alexandros Zeakis, George Papadakis, Dimitrios Skoutas, Manolis Koubarakis
TL;DR
DistillER addresses the high computational cost of LLM-based Entity Resolution by introducing a knowledge distillation framework that transfers knowledge from large teachers to small students without gold labels. It decomposes the problem into Data Selection, Knowledge Elicitation, and Distillation Algorithms, evaluating data selection strategies, teacher types, and training paradigms including supervised fine-tuning and reinforcement learning. Empirical results on eight real-world ER datasets show that supervised fine-tuning on noisy labels from LLM teachers yields the best overall performance, with reinforcement learning offering selective gains and explanations further enhancing supervision. Overall, DistillER delivers state-of-the-art or competitive performance while reducing inference cost, enabling scalable ER powered by LLM-derived knowledge without annotated data.
Abstract
Recent advances in Entity Resolution (ER) have leveraged Large Language Models (LLMs), achieving strong performance but at the cost of substantial computational resources or high financial overhead. Existing LLM-based ER approaches operate either in unsupervised settings and rely on very large and costly models, or in supervised settings and require ground-truth annotations, leaving a critical gap between time efficiency and effectiveness. To make LLM-powered ER more practical, we investigate Knowledge Distillation (KD) as a means to transfer knowledge from large, effective models (Teachers) to smaller, more efficient models (Students) without requiring gold labels. We introduce DistillER, the first framework that systematically bridges this gap across three dimensions: (i) Data Selection, where we study strategies for identifying informative subsets of data; (ii) Knowledge Elicitation, where we compare single- and multi-teacher settings across LLMs and smaller language models (SLMs); and (iii) Distillation Algorithms, where we evaluate supervised fine-tuning and reinforcement learning approaches. Our experiments reveal that supervised fine-tuning of Students on noisy labels generated by LLM Teachers consistently outperforms alternative KD strategies, while also enabling high-quality explanation generation. Finally, we benchmark DistillER against established supervised and unsupervised ER methods based on LLMs and SLMs, demonstrating significant improvements in both effectiveness and efficiency.
