Table of Contents
Fetching ...

Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search

Pipi Peng, Yunqing Jia, Ziqiang Zhou, murmurhash, Zichong Xiao

TL;DR

This work tackles CTR prediction in e-commerce search by proposing Light-weight End-to-End Graph Interest Network (EGIN), which jointly learns article-query graphs and CTR in a single pipeline. It introduces a query-item heterogeneous graph with three edge types (item2item, query2query, query2item), a light-weight sampling mechanism that avoids graph engines, and a multi-interest network to fuse graph-derived signals into CTR prediction. End-to-end training optimizes both CTR loss and graph-embedding losses, achieving substantial gains on both public and industrial data, with a notable online CTR uplift of 2.76% in production. The approach offers a scalable, plug-in framework for leveraging graph structure in search CTR and can be extended to incorporate additional node types and data sources.

Abstract

Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios, and therefore their graph structures highly depend on item's sequential information from user behaviors, ignoring query's sequential signal and query-item correlation. In this paper, we propose a new approach named Light-weight End-to-End Graph Interest Network (EGIN) to effectively mine users' search interests and tackle previous challenges. (i) EGIN utilizes query and item's correlation and sequential information from the search system to build a heterogeneous graph for better CTR prediction in e-commerce search. (ii) EGIN's graph embedding learning shares the same training input and is jointly trained with CTR prediction, making the end-to-end framework effortless to deploy in large-scale search systems. The proposed EGIN is composed of three parts: query-item heterogeneous graph, light-weight graph sampling, and multi-interest network. The query-item heterogeneous graph captures correlation and sequential information of query and item efficiently by the proposed light-weight graph sampling. The multi-interest network is well designed to utilize graph embedding to capture various similarity relationships between query and item to enhance the final CTR prediction. We conduct extensive experiments on both public and industrial datasets to demonstrate the effectiveness of the proposed EGIN. At the same time, the training cost of graph learning is relatively low compared with the main CTR prediction task, ensuring efficiency in practical applications.

Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search

TL;DR

This work tackles CTR prediction in e-commerce search by proposing Light-weight End-to-End Graph Interest Network (EGIN), which jointly learns article-query graphs and CTR in a single pipeline. It introduces a query-item heterogeneous graph with three edge types (item2item, query2query, query2item), a light-weight sampling mechanism that avoids graph engines, and a multi-interest network to fuse graph-derived signals into CTR prediction. End-to-end training optimizes both CTR loss and graph-embedding losses, achieving substantial gains on both public and industrial data, with a notable online CTR uplift of 2.76% in production. The approach offers a scalable, plug-in framework for leveraging graph structure in search CTR and can be extended to incorporate additional node types and data sources.

Abstract

Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios, and therefore their graph structures highly depend on item's sequential information from user behaviors, ignoring query's sequential signal and query-item correlation. In this paper, we propose a new approach named Light-weight End-to-End Graph Interest Network (EGIN) to effectively mine users' search interests and tackle previous challenges. (i) EGIN utilizes query and item's correlation and sequential information from the search system to build a heterogeneous graph for better CTR prediction in e-commerce search. (ii) EGIN's graph embedding learning shares the same training input and is jointly trained with CTR prediction, making the end-to-end framework effortless to deploy in large-scale search systems. The proposed EGIN is composed of three parts: query-item heterogeneous graph, light-weight graph sampling, and multi-interest network. The query-item heterogeneous graph captures correlation and sequential information of query and item efficiently by the proposed light-weight graph sampling. The multi-interest network is well designed to utilize graph embedding to capture various similarity relationships between query and item to enhance the final CTR prediction. We conduct extensive experiments on both public and industrial datasets to demonstrate the effectiveness of the proposed EGIN. At the same time, the training cost of graph learning is relatively low compared with the main CTR prediction task, ensuring efficiency in practical applications.
Paper Structure (21 sections, 11 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 21 sections, 11 equations, 4 figures, 5 tables, 2 algorithms.

Figures (4)

  • Figure 1: The framework of the proposed EGIN model. We use the unified input for both graph learning and CTR prediction network. Our query-item heterogeneous graph is constructed based on user behavior sequence and conducts embedding learning based on edges. We build i2i edges for every neighbor within the distance of 2 in click sequence and build q2i/q2q pairs by time and category constraints. The same behavior sequence is provided to the CTR prediction network, where multiple similarity relationships are calculated based on graph embedding.
  • Figure 2: Query-item heterogeneous graph structure. User behavior sequence containing query and click sequence ordered in time. First, neighboring items are connected by a window size of 2. Then queries are segmented into sessions based on their semantic similarity and happening time. Queries within the same session are all connected. Finally, every query and its nearby items within the time window that satisfy categorical constraints with the query is connected to capture item-query correlation.
  • Figure 3: CTR prediction performance on industrial dataset of different mixtures of user click sequence and seeds sequence as input to graph learning. While the click sequence contains more inter-category transitions, the seeds sequence concentrates more on items corresponding to the search intention. $\mathbf{80\%}$ of click sequence and $\mathbf{20\%}$ of seeds sequence achieves best result in our experiment.
  • Figure 4: Graph embedding similarity matrix of several randomly selected item pairs from our industrial dataset.