Table of Contents
Fetching ...

Semantic Search Evaluation

Chujie Zheng, Jeffrey Wang, Shuqian Albee Zhang, Anand Kishore, Siddharth Singh

TL;DR

A novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system is proposed and a metric called "on-topic rate" is introduced to measure the percentage of results that are relevant to the query.

Abstract

We propose a novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system. We introduce a metric called "on-topic rate" to measure the percentage of results that are relevant to the query. To achieve this, we design a pipeline that defines a golden query set, retrieves the top K results for each query, and sends calls to GPT 3.5 with formulated prompts. Our semantic evaluation pipeline helps identify common failure patterns and goals against the metric for relevance improvements.

Semantic Search Evaluation

TL;DR

A novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system is proposed and a metric called "on-topic rate" is introduced to measure the percentage of results that are relevant to the query.

Abstract

We propose a novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system. We introduce a metric called "on-topic rate" to measure the percentage of results that are relevant to the query. To achieve this, we design a pipeline that defines a golden query set, retrieves the top K results for each query, and sends calls to GPT 3.5 with formulated prompts. Our semantic evaluation pipeline helps identify common failure patterns and goals against the metric for relevance improvements.

Paper Structure

This paper contains 21 sections, 2 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Semantic evaluation pipeline