Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search
Pipi Peng, Yunqing Jia, Ziqiang Zhou, murmurhash, Zichong Xiao
TL;DR
This work tackles CTR prediction in e-commerce search by proposing Light-weight End-to-End Graph Interest Network (EGIN), which jointly learns article-query graphs and CTR in a single pipeline. It introduces a query-item heterogeneous graph with three edge types (item2item, query2query, query2item), a light-weight sampling mechanism that avoids graph engines, and a multi-interest network to fuse graph-derived signals into CTR prediction. End-to-end training optimizes both CTR loss and graph-embedding losses, achieving substantial gains on both public and industrial data, with a notable online CTR uplift of 2.76% in production. The approach offers a scalable, plug-in framework for leveraging graph structure in search CTR and can be extended to incorporate additional node types and data sources.
Abstract
Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios, and therefore their graph structures highly depend on item's sequential information from user behaviors, ignoring query's sequential signal and query-item correlation. In this paper, we propose a new approach named Light-weight End-to-End Graph Interest Network (EGIN) to effectively mine users' search interests and tackle previous challenges. (i) EGIN utilizes query and item's correlation and sequential information from the search system to build a heterogeneous graph for better CTR prediction in e-commerce search. (ii) EGIN's graph embedding learning shares the same training input and is jointly trained with CTR prediction, making the end-to-end framework effortless to deploy in large-scale search systems. The proposed EGIN is composed of three parts: query-item heterogeneous graph, light-weight graph sampling, and multi-interest network. The query-item heterogeneous graph captures correlation and sequential information of query and item efficiently by the proposed light-weight graph sampling. The multi-interest network is well designed to utilize graph embedding to capture various similarity relationships between query and item to enhance the final CTR prediction. We conduct extensive experiments on both public and industrial datasets to demonstrate the effectiveness of the proposed EGIN. At the same time, the training cost of graph learning is relatively low compared with the main CTR prediction task, ensuring efficiency in practical applications.
