Table of Contents
Fetching ...

Relevance Matters: A Multi-Task and Multi-Stage Large Language Model Approach for E-commerce Query Rewriting

Aijun Dai, Jixiang Zhang, Haiqing Hu, Guoyu Tang, Lin Liu, Ziguang Cheng

TL;DR

This research proposes a multi-task and multi-stage query rewriting framework grounded in large language models (LLMs) and injects the relevance task into query rewriting, resulting in elevating the search results'relevance and boosting the number of purchases made per user.

Abstract

For e-commerce search, user experience is measured by users' behavioral responses to returned products, like click-through rate and conversion rate, as well as the relevance between returned products and search queries. Consequently, relevance and user conversion constitute the two primary objectives in query rewriting, a strategy to bridge the lexical gap between user expressions and product descriptions. This research proposes a multi-task and multi-stage query rewriting framework grounded in large language models (LLMs). Critically, in contrast to previous works that primarily emphasized rewritten query generation, we inject the relevance task into query rewriting. Specifically, leveraging a pretrained model on user data and product information from JD.com, the approach initiates with multi-task supervised fine-tuning (SFT) comprising of the rewritten query generation task and the relevance tagging task between queries and rewrites. Subsequently, we employ Group Relative Policy Optimization (GRPO) for the model's objective alignment oriented toward enhancing the relevance and stimulating user conversions. Through offline evaluation and online A/B test, our framework illustrates substantial improvements in the effectiveness of e-commerce query rewriting, resulting in elevating the search results' relevance and boosting the number of purchases made per user (UCVR). Since August 2025, our approach has been implemented on JD.com, one of China's leading online shopping platforms.

Relevance Matters: A Multi-Task and Multi-Stage Large Language Model Approach for E-commerce Query Rewriting

TL;DR

This research proposes a multi-task and multi-stage query rewriting framework grounded in large language models (LLMs) and injects the relevance task into query rewriting, resulting in elevating the search results'relevance and boosting the number of purchases made per user.

Abstract

For e-commerce search, user experience is measured by users' behavioral responses to returned products, like click-through rate and conversion rate, as well as the relevance between returned products and search queries. Consequently, relevance and user conversion constitute the two primary objectives in query rewriting, a strategy to bridge the lexical gap between user expressions and product descriptions. This research proposes a multi-task and multi-stage query rewriting framework grounded in large language models (LLMs). Critically, in contrast to previous works that primarily emphasized rewritten query generation, we inject the relevance task into query rewriting. Specifically, leveraging a pretrained model on user data and product information from JD.com, the approach initiates with multi-task supervised fine-tuning (SFT) comprising of the rewritten query generation task and the relevance tagging task between queries and rewrites. Subsequently, we employ Group Relative Policy Optimization (GRPO) for the model's objective alignment oriented toward enhancing the relevance and stimulating user conversions. Through offline evaluation and online A/B test, our framework illustrates substantial improvements in the effectiveness of e-commerce query rewriting, resulting in elevating the search results' relevance and boosting the number of purchases made per user (UCVR). Since August 2025, our approach has been implemented on JD.com, one of China's leading online shopping platforms.
Paper Structure (22 sections, 18 equations, 2 figures, 7 tables)

This paper contains 22 sections, 18 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Multi-Task SFT and RL Architecture. (1)In SFT, LLM generates rewrites with the prompt as inputs and relev_tag is constructed by Offline Search Engine and Relevance Model. (2)In RL, LLM is optimized by Query-Based Normalized GRPO. The integrated reward consists of tag reward, productive reward, increment reward, relevance reward and rule reward.
  • Figure 2: Comparison of Recall Rate and Relevance Score