Table of Contents
Fetching ...

Cost-Efficient RAG for Entity Matching with LLMs: A Blocking-based Exploration

Chuangtao Ma, Zeyu Zhang, Arijit Khan, Sebastian Schelter, Paul Groth

TL;DR

This paper tackles the high computational cost of retrieval-augmented generation for large-scale entity matching. It introduces CE-RAG4EM, a blocking-guided architecture that enables batch retrieval and batch inference, integrated with a unified framework for evaluating RAG variants in EM. Through extensive experiments across nine benchmarks and multiple backbones, CE-RAG4EM achieves competitive or superior accuracy while substantially reducing end-to-end latency; it also characterizes trade-offs among retrieval granularity, blocking strategies, block size, and KG traversal. The findings offer practical recommendations for building scalable, cost-efficient RAG systems for data integration tasks, including adaptive retrieval granularity and a bias toward small-to-medium LLMs when grounded with high-value contextual knowledge.

Abstract

Retrieval-augmented generation (RAG) enhances LLM reasoning in knowledge-intensive tasks, but existing RAG pipelines incur substantial retrieval and generation overhead when applied to large-scale entity matching. To address this limitation, we introduce CE-RAG4EM, a cost-efficient RAG architecture that reduces computation through blocking-based batch retrieval and generation. We also present a unified framework for analyzing and evaluating RAG systems for entity matching, focusing on blocking-aware optimizations and retrieval granularity. Extensive experiments suggest that CE-RAG4EM can achieve comparable or improved matching quality while substantially reducing end-to-end runtime relative to strong baselines. Our analysis further reveals that key configuration parameters introduce an inherent trade-off between performance and overhead, offering practical guidance for designing efficient and scalable RAG systems for entity matching and data integration.

Cost-Efficient RAG for Entity Matching with LLMs: A Blocking-based Exploration

TL;DR

This paper tackles the high computational cost of retrieval-augmented generation for large-scale entity matching. It introduces CE-RAG4EM, a blocking-guided architecture that enables batch retrieval and batch inference, integrated with a unified framework for evaluating RAG variants in EM. Through extensive experiments across nine benchmarks and multiple backbones, CE-RAG4EM achieves competitive or superior accuracy while substantially reducing end-to-end latency; it also characterizes trade-offs among retrieval granularity, blocking strategies, block size, and KG traversal. The findings offer practical recommendations for building scalable, cost-efficient RAG systems for data integration tasks, including adaptive retrieval granularity and a bias toward small-to-medium LLMs when grounded with high-value contextual knowledge.

Abstract

Retrieval-augmented generation (RAG) enhances LLM reasoning in knowledge-intensive tasks, but existing RAG pipelines incur substantial retrieval and generation overhead when applied to large-scale entity matching. To address this limitation, we introduce CE-RAG4EM, a cost-efficient RAG architecture that reduces computation through blocking-based batch retrieval and generation. We also present a unified framework for analyzing and evaluating RAG systems for entity matching, focusing on blocking-aware optimizations and retrieval granularity. Extensive experiments suggest that CE-RAG4EM can achieve comparable or improved matching quality while substantially reducing end-to-end runtime relative to strong baselines. Our analysis further reveals that key configuration parameters introduce an inherent trade-off between performance and overhead, offering practical guidance for designing efficient and scalable RAG systems for entity matching and data integration.
Paper Structure (29 sections, 7 equations, 10 figures, 4 tables)

This paper contains 29 sections, 7 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Vanilla RAG vs. CE-RAG4EM for entity matching: (a) per-query retrieval/generation; (b) blocking-based batch retrieval/generation. Matched records share the same color.
  • Figure 2: Oveview of Our CE-RAG4EM. The framework is composed of five phases (a)-(e), which are detailed in § \ref{['sec:methodology']}. The check mark indicates that LLM responds with Yes for the given EM query, while a cross mark indicates that LLM responds with No.
  • Figure 3: Efficiency comparison of CE-RAG4EM against (a) LLM-EM and (b) PLM baselines.
  • Figure 4: Exp-2 (Retrieval Granularity). PID/QID: node-level retrieval in CE-RAG4EM-BR. EXP/BFS: KG-triple context construction in CE-KG-RAG4EM-BR via expansion or BFS. (a) F1 by dataset (sorted by KG-variant advantage). (b) Mean context-construction time per entity pair.
  • Figure 5: Exp-3 (Batching vs. per-query). F1/Prec./Rec. and end-to-end time per pair for RAG4EM, CE-RAG4EM-BR (retrieval by blocks), and CE-RAG4EM-BG (generation by blocks). Batched costs uniformly amortized per (sub-)block.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1: Entity Matching Problem
  • Definition 2: LLM-based Entity Matching
  • Definition 3: Knowledge Graph
  • Definition 4: Knowledge Retriever and Triple Search
  • Definition 5: Graph-Aware Serializer
  • Definition 6: RAG and KG-based Entity Matching
  • Definition 7: RAG with Batch Input and Inference