A Semi-supervised Multi-channel Graph Convolutional Network for Query Classification in E-commerce
Chunyuan Yuan, Ming Pang, Zheng Fang, Xue Jiang, Changping Peng, Zhangang Lin
TL;DR
This work tackles query intent classification in e-commerce by addressing long-tail category imbalances and unstable posterior labels derived from user clicks. It introduces SMGCN, a semi-supervised, multi-channel graph convolutional network that (i) learns joint query-category representations with a shared encoder, (ii) generates semi-supervised labels from query-category similarity, and (iii) fuses co-occurrence and semantic similarity graphs to produce robust category embeddings. The model combines posterior signals with semi-supervised guidance and leverages two category graphs to transfer information to tail categories, improving recall, especially for long-tail intents, as shown by strong offline results and online A/B gains on JD’s search engine. The approach is deployed in production, delivering measurable uplifts in user engagement and diversity of retrieved categories, and it highlights a practical pathway for large-scale, robust query intent classification in dynamic e-commerce settings. Future work includes integrating external knowledge such as hierarchical taxonomies to further enrich category representations.
Abstract
Query intent classification is an essential module for customers to find desired products on the e-commerce application quickly. Most existing query intent classification methods rely on the users' click behavior as a supervised signal to construct training samples. However, these methods based entirely on posterior labels may lead to serious category imbalance problems because of the Matthew effect in click samples. Compared with popular categories, it is difficult for products under long-tail categories to obtain traffic and user clicks, which makes the models unable to detect users' intent for products under long-tail categories. This in turn aggravates the problem that long-tail categories cannot obtain traffic, forming a vicious circle. In addition, due to the randomness of the user's click, the posterior label is unstable for the query with similar semantics, which makes the model very sensitive to the input, leading to an unstable and incomplete recall of categories. In this paper, we propose a novel Semi-supervised Multi-channel Graph Convolutional Network (SMGCN) to address the above problems from the perspective of label association and semi-supervised learning. SMGCN extends category information and enhances the posterior label by utilizing the similarity score between the query and categories. Furthermore, it leverages the co-occurrence and semantic similarity graph of categories to strengthen the relations among labels and weaken the influence of posterior label instability. We conduct extensive offline and online A/B experiments, and the experimental results show that SMGCN significantly outperforms the strong baselines, which shows its effectiveness and practicality.
