Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search

Ziqi Zhang; Yupin Huang; Quan Deng; Jinghui Xiao; Vivek Mittal; Jingyuan Deng

Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search

Ziqi Zhang, Yupin Huang, Quan Deng, Jinghui Xiao, Vivek Mittal, Jingyuan Deng

TL;DR

This study tackles tail-query sparsity in e-commerce search by strengthening query reformulation (QR) through scalable data mining and training innovations. The retrieval objective uses a contrastive loss with sample importance, $L_{retrieval} = -\frac{1}{N} \cdot \sum I(q_i,q_{ik}) \cdot \frac{\exp{s_{BE}(q_i,q_{ik})/\tau}}{\sum_{j\in\text{batch}} \exp{s_{BE}(q_i,q_j)/\tau}}$, and the re-ranking hard-negative loss uses $\mathcal{L}_{\mathrm{re-ranking, hard}} = \log\left(1+\sum_{j\in H(q_i)} \exp s_{CE}(q_i,q_j)\right) \cdot \left(1+\sum_{q_{ik}} \exp(-s_{CE}(q_i,q_{ik}))\right)$. The approach combines sample-importance mining, hard-negative mining via ANCE, a two-stage retrieval+re-ranking pipeline, and query normalization to extend QR to non-English and low-traffic markets. Offline results show meaningful improvements in recall@100 and NDCG@3, and online A/B tests in Japanese, Hindi, and English markets confirm revenue and ads gains, demonstrating the method's scalability and multilingual applicability. Overall, the work advances robust QR for smaller, multilingual e-commerce settings with practical impact on search relevance and monetization.

Abstract

Customer behavioral data significantly impacts e-commerce search systems. However, in the case of less common queries, the associated behavioral data tends to be sparse and noisy, offering inadequate support to the search mechanism. To address this challenge, the concept of query reformulation has been introduced. It suggests that less common queries could utilize the behavior patterns of their popular counterparts with similar meanings. In Amazon product search, query reformulation has displayed its effectiveness in improving search relevance and bolstering overall revenue. Nonetheless, adapting this method for smaller or emerging businesses operating in regions with lower traffic and complex multilingual settings poses the challenge in terms of scalability and extensibility. This study focuses on overcoming this challenge by constructing a query reformulation solution capable of functioning effectively, even when faced with limited training data, in terms of quality and scale, along with relatively complex linguistic characteristics. In this paper we provide an overview of the solution implemented within Amazon product search infrastructure, which encompasses a range of elements, including refining the data mining process, redefining model training objectives, and reshaping training strategies. The effectiveness of the proposed solution is validated through online A/B testing on search ranking and Ads matching. Notably, employing the proposed solution in search ranking resulted in 0.14% and 0.29% increase in overall revenue in Japanese and Hindi cases, respectively, and a 0.08% incremental gain in the English case compared to the legacy implementation; while in search Ads matching led to a 0.36% increase in Ads revenue in the Japanese case.

Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search

TL;DR

, and the re-ranking hard-negative loss uses

. The approach combines sample-importance mining, hard-negative mining via ANCE, a two-stage retrieval+re-ranking pipeline, and query normalization to extend QR to non-English and low-traffic markets. Offline results show meaningful improvements in recall@100 and NDCG@3, and online A/B tests in Japanese, Hindi, and English markets confirm revenue and ads gains, demonstrating the method's scalability and multilingual applicability. Overall, the work advances robust QR for smaller, multilingual e-commerce settings with practical impact on search relevance and monetization.

Abstract

Paper Structure (16 sections, 6 equations, 3 figures, 6 tables)

This paper contains 16 sections, 6 equations, 3 figures, 6 tables.

Introduction
Related work
Problem formulation
Method
Retrieval model training schema
Sample importance through behavioral mining
Re-ranking model training schema
Query normalization for denoising
Model training with hard negative mining
Experiment
Dataset generation
Offline evaluation criteria
Offline experiment result
Offline query-query relevance auditing
Online experiment
...and 1 more sections

Figures (3)

Figure 1: An illustration of a) QR pipeline in model training stage (hollow arrows), b) QR pipeline in online inference system (solid arrows), c) data artifact used in training and inference (text box without borderline), d) our refinements to the QR pipeline (text box with borderline).
Figure 2: The distribution of number of purchase on product under different queries.
Figure 3: The ANCE mining and model continuous finetune process. Specifically, the bi-encoder model is iteratively finetuned using ANCE data derived by self-retrieved pairs subtracting pairs with co-purchase. Then, the cross-encoder model is finetuned with the ANCE data from the bi-encoder.

Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search

TL;DR

Abstract

Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search

Authors

TL;DR

Abstract

Table of Contents

Figures (3)