Table of Contents
Fetching ...

Taxonomy-based Negative Sampling In Personalized Semantic Search for E-commerce

Uthman Jinadu, Siawpeng Er, Le Yu, Chen Liang, Bingxin Li, Yi Ding, Aleksandar Velkoski

TL;DR

The paper tackles semantic product retrieval in large-scale e-commerce by introducing a taxonomy-based hard-negative sampling (TB-HNS) strategy and a personalized semantic engine. It deploys a two-tower bi-encoder with a personalized variant that fuses user demographics and past purchases, trained with Multiple Negative Ranking Loss and TB-HNS to learn fine-grained distinctions among similar items. Offline results show recall gains over BM25, DistilledBERT, and ANCE, while online A/B tests reveal improvements in Conversion Rate, Add-to-Cart Rate, and Average Order Value, all within latency targets. The work demonstrates generalization to a public dataset, improves cold-start item retrieval, and offers practical deployment lessons for large-scale e-commerce search systems with reduced negative sampling overhead.

Abstract

Large retail outlets offer products that may be domain-specific, and this requires having a model that can understand subtle differences in similar items. Sampling techniques used to train these models are most of the time, computationally expensive or logistically challenging. These models also do not factor in users' previous purchase patterns or behavior, thereby retrieving irrelevant items for them. We present a semantic retrieval model for e-commerce search that embeds queries and products into a shared vector space and leverages a novel taxonomy-based hard-negative sampling(TB-HNS) strategy to mine contextually relevant yet challenging negatives. To further tailor retrievals, we incorporate user-level personalization by modeling each customer's past purchase history and behavior. In offline experiments, our approach outperforms BM25, ANCE and leading neural baselines on Recall@K, while live A/B testing shows substantial uplifts in conversion rate, add-to-cart rate, and average order value. We also demonstrate that our taxonomy-driven negatives reduce training overhead and accelerate convergence, and we share practical lessons from deploying this system at scale.

Taxonomy-based Negative Sampling In Personalized Semantic Search for E-commerce

TL;DR

The paper tackles semantic product retrieval in large-scale e-commerce by introducing a taxonomy-based hard-negative sampling (TB-HNS) strategy and a personalized semantic engine. It deploys a two-tower bi-encoder with a personalized variant that fuses user demographics and past purchases, trained with Multiple Negative Ranking Loss and TB-HNS to learn fine-grained distinctions among similar items. Offline results show recall gains over BM25, DistilledBERT, and ANCE, while online A/B tests reveal improvements in Conversion Rate, Add-to-Cart Rate, and Average Order Value, all within latency targets. The work demonstrates generalization to a public dataset, improves cold-start item retrieval, and offers practical deployment lessons for large-scale e-commerce search systems with reduced negative sampling overhead.

Abstract

Large retail outlets offer products that may be domain-specific, and this requires having a model that can understand subtle differences in similar items. Sampling techniques used to train these models are most of the time, computationally expensive or logistically challenging. These models also do not factor in users' previous purchase patterns or behavior, thereby retrieving irrelevant items for them. We present a semantic retrieval model for e-commerce search that embeds queries and products into a shared vector space and leverages a novel taxonomy-based hard-negative sampling(TB-HNS) strategy to mine contextually relevant yet challenging negatives. To further tailor retrievals, we incorporate user-level personalization by modeling each customer's past purchase history and behavior. In offline experiments, our approach outperforms BM25, ANCE and leading neural baselines on Recall@K, while live A/B testing shows substantial uplifts in conversion rate, add-to-cart rate, and average order value. We also demonstrate that our taxonomy-driven negatives reduce training overhead and accelerate convergence, and we share practical lessons from deploying this system at scale.

Paper Structure

This paper contains 40 sections, 8 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Retrieval system architecture. Offline (bottom), item metadata from the catalog are encoded by the item Encoder to produce item embeddings, which are indexed in an ANN service. Online (top/bottom), a user query is encoded by the Semantic Engine to a query embedding, matched by the ANN module against the prebuilt index; candidates are then de-duplicated/filtered/merged and sent to a multi-stage ranking stack before display to the customer. The blue module marks the component we modify: the Semantic Engine. Our contribution is to train this model with our novel taxonomy-based hard-negative sampling, enabling finer discrimination among closely related products. Personalized variant: the Semantic Engine can fuse customer features and past purchases $(c,\;h_{\text{pur}})$ with the query via a dense layer to form a personalized query embedding $q_c$ before ANN retrieval; the ANN index and downstream ranking remain identical.
  • Figure 2: Semantic Engine without personalization (left): a two-tower bi-encoder where the query and item are encoded and scored with a dot product. Enhanced Personalized Semantic Engine (right): augments the baseline with a customer tower that fuses the query with profile features $c$ and purchase history $h_{\text{pur}}$ via a dense layer to form a personalized query embedding $q_c$; the item tower and similarity function remain unchanged.
  • Figure 3: Taxonomy-based hard-negative sampling. For each query $Q_i$, the ground-truth positive item $P_i^\star$ is shown in red. Candidate hard negatives $P_{i,m}$ are shown in green, with different shades indicating different hard-negative items. Negatives are sampled from sibling items under the same parent category in the product taxonomy as $P_i^\star$. This yields "near-miss" negatives-contextually similar but not identical to the positive, so the model learns fine-grained distinctions (e.g., brand, size, finish) rather than relying on broad category features.
  • Figure 4: Embedding Similarity scores for infrequent and frequent shopper queries before and after personalization. Higher scores indicate better alignment with relevant items. The results demonstrate that personalization improves search relevance by increasing embedding similarity scores across both shopper types. After personalization, both distributions shift upward, with a larger median lift for infrequent shoppers; the lower tail and outliers shrink and the interquartile range narrows, indicating more stable relevance. Frequent shoppers begin higher and still gain (refinement), while the gap between the two cohorts narrows (correction of underspecified queries). Overall, personalization recovers missing brand/attribute cues for broad queries and sharpens already‐specific ones.
  • Figure : Taxonomy-Based Hard Negative Sampling