Table of Contents
Fetching ...

Long or Short or Both? An Exploration on Lookback Time Windows of Behavioral Features in Product Search Ranking

Qi Liu, Atul Singh, Jingbo Liu, Cun Mu, Zheng Yan, Jan Pedersen

TL;DR

Problem: determine how lookback time windows for (query, product)-level behavioral features affect product search ranking in eCommerce. Method: measure long ($|T|=730$) and short ($|T|=30$) windows, using a Bayesian-smoothed posterior $br_{q,p} = (sum_{t} b_{q,p}^{(t)} + α) / (sum_{t} e_{q,p}^{(t)} + α + β)$ to generate features, and evaluate Baseline, Model A, Model B, and Model C within a tree-based ranking framework; The key innovation is to add query-level vertical signals to guide the integration of features from different windows. Results: long windows help stable verticals like Food/Consumables, short windows help dynamic ones like Fashion/ETS; naive combination harms performance, but vertical-guided multi-window integration (Model C) yields statistically significant gains in engagement and GMV in online A/B tests. Significance: demonstrates a scalable approach to more robust ranking by leveraging temporal diversity and query context, with practical benefits for eCommerce search, and sets the stage for broader horizon and signal expansion.

Abstract

Customer shopping behavioral features are core to product search ranking models in eCommerce. In this paper, we investigate the effect of lookback time windows when aggregating these features at the (query, product) level over history. By studying the pros and cons of using long and short time windows, we propose a novel approach to integrating these historical behavioral features of different time windows. In particular, we address the criticality of using query-level vertical signals in ranking models to effectively aggregate all information from different behavioral features. Anecdotal evidence for the proposed approach is also provided using live product search traffic on Walmart.com.

Long or Short or Both? An Exploration on Lookback Time Windows of Behavioral Features in Product Search Ranking

TL;DR

Problem: determine how lookback time windows for (query, product)-level behavioral features affect product search ranking in eCommerce. Method: measure long () and short () windows, using a Bayesian-smoothed posterior to generate features, and evaluate Baseline, Model A, Model B, and Model C within a tree-based ranking framework; The key innovation is to add query-level vertical signals to guide the integration of features from different windows. Results: long windows help stable verticals like Food/Consumables, short windows help dynamic ones like Fashion/ETS; naive combination harms performance, but vertical-guided multi-window integration (Model C) yields statistically significant gains in engagement and GMV in online A/B tests. Significance: demonstrates a scalable approach to more robust ranking by leveraging temporal diversity and query context, with practical benefits for eCommerce search, and sets the stage for broader horizon and signal expansion.

Abstract

Customer shopping behavioral features are core to product search ranking models in eCommerce. In this paper, we investigate the effect of lookback time windows when aggregating these features at the (query, product) level over history. By studying the pros and cons of using long and short time windows, we propose a novel approach to integrating these historical behavioral features of different time windows. In particular, we address the criticality of using query-level vertical signals in ranking models to effectively aggregate all information from different behavioral features. Anecdotal evidence for the proposed approach is also provided using live product search traffic on Walmart.com.
Paper Structure (9 sections, 2 equations, 2 figures, 5 tables)

This paper contains 9 sections, 2 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Distribution of different behavioral nodes under vertical nodes for Fashion and Consumables. We summarize the percentage of behavioral tree nodes under the verticals. The behavioral feature names have 3 segmented sub-strings: the first indicates the behavioral data source (web and app); the second indicates the time window lengths (730-day and 30-day); the third indicates the behavioral types (cr = click rate; ar = ATC rate; or = order rate).
  • Figure 2: Example of customer search ranking experience comparing baseline vs. Model C. This comparison was performed in December 2023 for query new year's eve. The baseline incorrectly prioritizes many 2023 New Year's Eve supplies over 2024 products, which should have been ranked higher as the event for 2024 approaches. In contrast, Model C effectively ranks 2024 products higher than those for 2023. In this example, the 2024 glasses item is elevated from position 46 to 13, while the 2023 hat item is demoted from position 21 to beyond 60, moving it off the first page of search results.