Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language

Ali Mahboub; Muhy Eddin Za'ter; Bashar Al-Rfooh; Yazan Estaitia; Adnan Jaljuli; Asma Hakouz

Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language

Ali Mahboub, Muhy Eddin Za'ter, Bashar Al-Rfooh, Yazan Estaitia, Adnan Jaljuli, Asma Hakouz

TL;DR

This work tackles the lack of benchmarks for semantic search in Arabic and its impact on retrieval-augmented generation (RAG) for Arabic question answering. It proposes a simple yet effective benchmark and an evaluation pipeline, including a dataset of 2030 Arabic customer support summaries and 406 GPT-4 generated queries with relevance labels, evaluated with standard IR metrics such as $nDCG$, $MRR$, and $mAP$. An extensive encoder comparison across five Arabic focused models reveals that Paraphrase Multilingual MPNet offers the best semantic search performance, though larger embedding sizes increase computation. The results demonstrate that integrating semantic search into RAG can enhance answer quality and enable shorter prompts, improving efficiency, while highlighting a need for further study on encoder selection and Arabic specific RAG optimization.

Abstract

The latest advancements in machine learning and deep learning have brought forth the concept of semantic similarity, which has proven immensely beneficial in multiple applications and has largely replaced keyword search. However, evaluating semantic similarity and conducting searches for a specific query across various documents continue to be a complicated task. This complexity is due to the multifaceted nature of the task, the lack of standard benchmarks, whereas these challenges are further amplified for Arabic language. This paper endeavors to establish a straightforward yet potent benchmark for semantic search in Arabic. Moreover, to precisely evaluate the effectiveness of these metrics and the dataset, we conduct our assessment of semantic search within the framework of retrieval augmented generation (RAG).

Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language

TL;DR

, and

. An extensive encoder comparison across five Arabic focused models reveals that Paraphrase Multilingual MPNet offers the best semantic search performance, though larger embedding sizes increase computation. The results demonstrate that integrating semantic search into RAG can enhance answer quality and enable shorter prompts, improving efficiency, while highlighting a need for further study on encoder selection and Arabic specific RAG optimization.

Abstract

Paper Structure (17 sections, 5 equations, 1 figure, 2 tables)

This paper contains 17 sections, 5 equations, 1 figure, 2 tables.

Introduction
Literature review
Evaluation Methodology
Dataset Generation
Evaluation Metrics
Normalized Discounted Cumulative Gain (nDCG)
Mean Reciprocal Rank (MRR)
Mean Average Precision (mAP)
Semantic Search Approach
Assessment of Encoders
RAG Evaluation Setup
Dataset Creation
RAG Pipeline Implementation
Results
Semantic Search Evaluation results
...and 2 more sections

Figures (1)

Figure 1: Retrieved-Augmented-Generation with Semantic Search Pipeline

Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language

TL;DR

Abstract

Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic Language

Authors

TL;DR

Abstract

Table of Contents

Figures (1)