Table of Contents
Fetching ...

ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems

Haibo Xing, Kanefumi Matsuyama, Hao Deng, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

TL;DR

Effective and Semantic-Aware Negative Sampling (ESANS) is proposed, which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC).

Abstract

Industrial recommendation systems typically involve a two-stage process: retrieval and ranking, which aims to match users with millions of items. In the retrieval stage, classic embedding-based retrieval (EBR) methods depend on effective negative sampling techniques to enhance both performance and efficiency. However, existing techniques often suffer from false negatives, high cost for ensuring sampling quality and semantic information deficiency. To address these limitations, we propose Effective and Semantic-Aware Negative Sampling (ESANS), which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC). EDIS generates virtual samples within the low-dimensional embedding space to improve the diversity and density of the sampling distribution while minimizing computational costs. MSAC refines the negative sampling distribution by hierarchically clustering item representations based on multimodal information (visual, textual, behavioral), ensuring semantic consistency and reducing false negatives. Extensive offline and online experiments demonstrate the superior efficiency and performance of ESANS.

ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems

TL;DR

Effective and Semantic-Aware Negative Sampling (ESANS) is proposed, which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC).

Abstract

Industrial recommendation systems typically involve a two-stage process: retrieval and ranking, which aims to match users with millions of items. In the retrieval stage, classic embedding-based retrieval (EBR) methods depend on effective negative sampling techniques to enhance both performance and efficiency. However, existing techniques often suffer from false negatives, high cost for ensuring sampling quality and semantic information deficiency. To address these limitations, we propose Effective and Semantic-Aware Negative Sampling (ESANS), which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC). EDIS generates virtual samples within the low-dimensional embedding space to improve the diversity and density of the sampling distribution while minimizing computational costs. MSAC refines the negative sampling distribution by hierarchically clustering item representations based on multimodal information (visual, textual, behavioral), ensuring semantic consistency and reducing false negatives. Extensive offline and online experiments demonstrate the superior efficiency and performance of ESANS.

Paper Structure

This paper contains 22 sections, 14 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Visual diagram of our ESANS compared with other methods. Each method has sampled ten negatives equally.
  • Figure 2: Our proposed ESANS framework. a) Multimodal-aligned Technique. b) Vector Quantized Clustering with Cascaded Codebooks. c) Semantic-Aware Negative Sampling & Effective Dense Interpolation Strategy (EDIS).
  • Figure 3: The visualization of items in the representation space during secondary clustering. Although item 1-3 and item 4-5 have similar mean embeddings, but in each view thier embeddings differ significantly, resulting in their assignment to different secondary clusters. By leveraging three modalities, clustering accuracy is significantly enhanced.
  • Figure 4: The performance of ESANS under different hyper-parameters($K_p$, $K_s$ and $\lambda$) on #A1-#A4 industrial datasets.
  • Figure :