Table of Contents
Fetching ...

Learning to Comparison-Shop

Jie Tang, Daochen Zha, Xin Liu, Huiji Gao, Liwei He, Stephanie Moyerman, Sanjeev Katariya

TL;DR

This paper tackles the mismatch between traditional search ranking and users' comparison-shopping behavior in online marketplaces by introducing Learning-to-Comparison-Shop (LTCS). LTCS jointly trains a lightweight pointwise initial ranker and a setwise transformer-based re-ranker to emulate the two-stage decision process of evaluation followed by comparison, using a behavior-aligned co-training objective. In production at Airbnb, LTCS yields offline NDCG gains and an online booking conversion-rate improvement (+0.6%), along with enhanced user efficiency during the search process. The work demonstrates that behavior-aligned, two-stage training can outperform strong baselines and offers a practical blueprint for deploying complex ranking systems in large-scale marketplaces.

Abstract

In online marketplaces like Airbnb, users frequently engage in comparison shopping before making purchase decisions. Despite the prevalence of this behavior, a significant disconnect persists between mainstream e-commerce search engines and users' comparison needs. Traditional ranking models often evaluate items in isolation, disregarding the context in which users compare multiple items on a search results page. While recent advances in deep learning have sought to improve ranking accuracy, diversity, and fairness by encoding listwise context, the challenge of aligning search rankings with user comparison shopping behavior remains inadequately addressed. In this paper, we propose a novel ranking architecture - Learning-to-Comparison-Shop (LTCS) System - that explicitly models and learns users' comparison shopping behaviors. Through extensive offline and online experiments, we demonstrate that our approach yields statistically significant gains in key business metrics - improving NDCG by 1.7% and boosting booking conversion rate by 0.6% in A/B testing - while also enhancing user experience. We also compare our model against state-of-the-art approaches and demonstrate that LTCS significantly outperforms them.

Learning to Comparison-Shop

TL;DR

This paper tackles the mismatch between traditional search ranking and users' comparison-shopping behavior in online marketplaces by introducing Learning-to-Comparison-Shop (LTCS). LTCS jointly trains a lightweight pointwise initial ranker and a setwise transformer-based re-ranker to emulate the two-stage decision process of evaluation followed by comparison, using a behavior-aligned co-training objective. In production at Airbnb, LTCS yields offline NDCG gains and an online booking conversion-rate improvement (+0.6%), along with enhanced user efficiency during the search process. The work demonstrates that behavior-aligned, two-stage training can outperform strong baselines and offers a practical blueprint for deploying complex ranking systems in large-scale marketplaces.

Abstract

In online marketplaces like Airbnb, users frequently engage in comparison shopping before making purchase decisions. Despite the prevalence of this behavior, a significant disconnect persists between mainstream e-commerce search engines and users' comparison needs. Traditional ranking models often evaluate items in isolation, disregarding the context in which users compare multiple items on a search results page. While recent advances in deep learning have sought to improve ranking accuracy, diversity, and fairness by encoding listwise context, the challenge of aligning search rankings with user comparison shopping behavior remains inadequately addressed. In this paper, we propose a novel ranking architecture - Learning-to-Comparison-Shop (LTCS) System - that explicitly models and learns users' comparison shopping behaviors. Through extensive offline and online experiments, we demonstrate that our approach yields statistically significant gains in key business metrics - improving NDCG by 1.7% and boosting booking conversion rate by 0.6% in A/B testing - while also enhancing user experience. We also compare our model against state-of-the-art approaches and demonstrate that LTCS significantly outperforms them.

Paper Structure

This paper contains 17 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Top: An illustration of how users book listings on Airbnb, encompassing two stages: evaluation and comparison. In the evaluation stage, users quickly browse through the available listings to identify those that pique their interest. During the comparison stage, they compare a small selection of candidate listings across various dimensions to make a final decision. Bottom: We model this comparison shopping behavior using pointwise initial ranker and set-wise reranker for the above two stages, respectively. These two networks are co-trained to predict users' booking probabilities.
  • Figure 2: Overall system diagram: For a given candidate item, initial ranker computes its initial ranking logit and also initial ranking embedding based on query and item features. To compute the context embedding, an encoder-only Transformer is employed over top-k items' initial ranking embeddings. Finally the candidate item's initial embedding and context embedding are concatenated and passed through a MLP to generate re-ranker logit. The loss is computed by weighted sum of initial ranker loss and re-ranker loss.
  • Figure 3: Left: Impact of Re-Ranker Input Length on NDCG Gain: Increasing Input Length Leads to Higher NDCG Gains. Right: Effect of encoder-only Transformer Layers on NDCG Gain: Increasing Layers Improves Performance.
  • Figure 4: Effect of Re-Ranker Loss Weight on NDCG Gain: Higher Weights Improve Re-Ranker Performance and Benefit the Initial Ranker. The X-axis represents the re-ranker loss weight $\alpha$, while the Y-axis shows the NDCG gain relative to a ranking system using only the initial ranker.