Semantic Search Evaluation
Chujie Zheng, Jeffrey Wang, Shuqian Albee Zhang, Anand Kishore, Siddharth Singh
TL;DR
A novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system is proposed and a metric called "on-topic rate" is introduced to measure the percentage of results that are relevant to the query.
Abstract
We propose a novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system. We introduce a metric called "on-topic rate" to measure the percentage of results that are relevant to the query. To achieve this, we design a pipeline that defines a golden query set, retrieves the top K results for each query, and sends calls to GPT 3.5 with formulated prompts. Our semantic evaluation pipeline helps identify common failure patterns and goals against the metric for relevance improvements.
