Table of Contents
Fetching ...

Towards Robustness Analysis of E-Commerce Ranking System

Ningfei Wang, Yupin Huang, Han Cheng, Jiri Gesi, Xiaojie Wang, Vivek Mittal

TL;DR

This paper presents the first systematic measurement study on the robustness of e-commerce ranking systems and proposes a novel metric that considers both ranking position and item-specific information that are absent in existing metrics.

Abstract

Information retrieval (IR) is a pivotal component in various applications. Recent advances in machine learning (ML) have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of Large Language Models. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.

Towards Robustness Analysis of E-Commerce Ranking System

TL;DR

This paper presents the first systematic measurement study on the robustness of e-commerce ranking systems and proposes a novel metric that considers both ranking position and item-specific information that are absent in existing metrics.

Abstract

Information retrieval (IR) is a pivotal component in various applications. Recent advances in machine learning (ML) have enabled the integration of ML algorithms into IR, particularly in ranking systems. While there is a plethora of research on the robustness of ML-based ranking systems, these studies largely neglect commercial e-commerce systems and fail to establish a connection between real-world and manipulated query relevance. In this paper, we present the first systematic measurement study on the robustness of e-commerce ranking systems. We define robustness as the consistency of ranking outcomes for semantically identical queries. To quantitatively analyze robustness, we propose a novel metric that considers both ranking position and item-specific information that are absent in existing metrics. Our large-scale measurement study with real-world data from e-commerce retailers reveals an open opportunity to measure and improve robustness since semantically identical queries often yield inconsistent ranking results. Based on our observations, we propose several solution directions to enhance robustness, such as the use of Large Language Models. Note that the issue of robustness discussed herein does not constitute an error or oversight. Rather, in scenarios where there exists a vast array of choices, it is feasible to present a multitude of products in various permutations, all of which could be equally appealing. However, this extensive selection may lead to customer confusion. As e-commerce retailers use various techniques to improve the quality of search results, we hope that this research offers valuable guidance for measuring the robustness of the ranking systems.
Paper Structure (23 sections, 2 equations, 7 figures, 3 tables)

This paper contains 23 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Motivating examples: the two queries are semantically same but their top 3 ranking results are totally different.
  • Figure 2: Histogram of TPS and Q2Q data evaluated with RDS metric on millions of query pairs.
  • Figure 3: Illustration of participants and semantically identical query pairs for user study. Specifically, 80% of participants regarded 80% of the query pairs as semantically identical.
  • Figure 4: Histogram for RDS from TPS data over a five-month period with more than 20 million query pairs for each time.
  • Figure 5: Comparison of histograms between the original e-commerce ranking model and two larger models on millions of TPS query pairs. Lower scores indicate greater robustness.
  • ...and 2 more figures