Cost-Efficient RAG for Entity Matching with LLMs: A Blocking-based Exploration
Chuangtao Ma, Zeyu Zhang, Arijit Khan, Sebastian Schelter, Paul Groth
TL;DR
This paper tackles the high computational cost of retrieval-augmented generation for large-scale entity matching. It introduces CE-RAG4EM, a blocking-guided architecture that enables batch retrieval and batch inference, integrated with a unified framework for evaluating RAG variants in EM. Through extensive experiments across nine benchmarks and multiple backbones, CE-RAG4EM achieves competitive or superior accuracy while substantially reducing end-to-end latency; it also characterizes trade-offs among retrieval granularity, blocking strategies, block size, and KG traversal. The findings offer practical recommendations for building scalable, cost-efficient RAG systems for data integration tasks, including adaptive retrieval granularity and a bias toward small-to-medium LLMs when grounded with high-value contextual knowledge.
Abstract
Retrieval-augmented generation (RAG) enhances LLM reasoning in knowledge-intensive tasks, but existing RAG pipelines incur substantial retrieval and generation overhead when applied to large-scale entity matching. To address this limitation, we introduce CE-RAG4EM, a cost-efficient RAG architecture that reduces computation through blocking-based batch retrieval and generation. We also present a unified framework for analyzing and evaluating RAG systems for entity matching, focusing on blocking-aware optimizations and retrieval granularity. Extensive experiments suggest that CE-RAG4EM can achieve comparable or improved matching quality while substantially reducing end-to-end runtime relative to strong baselines. Our analysis further reveals that key configuration parameters introduce an inherent trade-off between performance and overhead, offering practical guidance for designing efficient and scalable RAG systems for entity matching and data integration.
