Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce
Houye Ji, Ye Tang, Zhaoxin Chen, Lixi Deng, Jun Hu, Lei Su
TL;DR
This work introduces a dual-graph representation for video-driven e-commerce and formulates user preference understanding as a bi-level graph matching problem. The proposed Graph Matching Network (GMN) performs node-level matching to relate videos and items and preference-level matching to align user preferences across modalities, yielding enhanced user embeddings and improved retrieval performance. Extensive offline and online evaluations show GMN outperforms state-of-the-art baselines with notable gains in AUC and CTR, and it has been deployed in a large-scale platform affecting hundreds of millions of users daily. The approach demonstrates the practicality of dual-graph modeling and bi-level matching for scalable, heterogeneous recommendation tasks in video-centric e-commerce.
Abstract
With the rapid development of the short video industry, traditional e-commerce has encountered a new paradigm, video-driven e-commerce, which leverages attractive videos for product showcases and provides both video and item services for users. Benefitting from the dynamic and visualized introduction of items,video-driven e-commerce has shown huge potential in stimulating consumer confidence and promoting sales. In this paper, we focus on the video retrieval task, facing the following challenges: (1) Howto handle the heterogeneities among users, items, and videos? (2)How to mine the complementarity between items and videos for better user understanding? In this paper, we first leverage the dual graph to model the co-existing of user-video and user-item interactions in video-driven e-commerce and innovatively reduce user preference understanding to a graph matching problem. To solve it, we further propose a novel bi-level Graph Matching Network(GMN), which mainly consists of node- and preference-level graph matching. Given a user, node-level graph matching aims to match videos and items, while preference-level graph matching aims to match multiple user preferences extracted from both videos and items. Then the proposed GMN can generate and improve user embedding by aggregating matched nodes or preferences from the dual graph in a bi-level manner. Comprehensive experiments show the superiority of the proposed GMN with significant improvements over state-of-the-art approaches (e.g., AUC+1.9% and CTR+7.15%). We have developed it on a well-known video-driven e-commerce platform, serving hundreds of millions of users every day
