Relevance Filtering for Embedding-based Retrieval
Nicholas Rossi, Juexin Lin, Feng Liu, Zhen Yang, Tony Lee, Alessandro Magnani, Ciya Liao
TL;DR
The paper tackles the precision gap in embedding-based retrieval by introducing a Cosine Adapter to map raw cosine scores to interpretable values and applying a global threshold for filtering. It further leverages a Relevance Reward Model (RRM) trained on human judgments to revise training labels and to define a relevance objective, integrating this with a multi-objective loss, typo-aware training, and an enhanced negative sampling strategy that includes semi-positives. The resulting dual-encoder model (DistilBERT) with a frozen RRM, along with stratified sampling and a cross-encoder-based relevance signal, yields substantial offline gains in exact-match and purchased-product metrics and positive online revenue lifts in Walmart's hybrid retrieval system. The key contribution is a cohesive set of techniques that align retrieval with human relevance judgments while maintaining efficiency in large-scale e-commerce contexts. Practically, these methods improve precision without severely sacrificing recall, and their online validation demonstrates real-world impact on user experience and monetization.
Abstract
In embedding-based retrieval, Approximate Nearest Neighbor (ANN) search enables efficient retrieval of similar items from large-scale datasets. While maximizing recall of relevant items is usually the goal of retrieval systems, a low precision may lead to a poor search experience. Unlike lexical retrieval, which inherently limits the size of the retrieved set through keyword matching, dense retrieval via ANN search has no natural cutoff. Moreover, the cosine similarity scores of embedding vectors are often optimized via contrastive or ranking losses, which make them difficult to interpret. Consequently, relying on top-K or cosine-similarity cutoff is often insufficient to filter out irrelevant results effectively. This issue is prominent in product search, where the number of relevant products is often small. This paper introduces a novel relevance filtering component (called "Cosine Adapter") for embedding-based retrieval to address this challenge. Our approach maps raw cosine similarity scores to interpretable scores using a query-dependent mapping function. We then apply a global threshold on the mapped scores to filter out irrelevant results. We are able to significantly increase the precision of the retrieved set, at the expense of a small loss of recall. The effectiveness of our approach is demonstrated through experiments on both public MS MARCO dataset and internal Walmart product search data. Furthermore, online A/B testing on the Walmart site validates the practical value of our approach in real-world e-commerce settings.
