ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users
Dakai Zhai, Jiong Gao, Boya Du, Junwei Xu, Qijie Shen, Jialin Zhu, Yuning Jiang
TL;DR
ChoirRec addresses CVR prediction for low-activity users by constructing high-quality semantic user groups via LLMs and leveraging them through a group-aware, dual-channel architecture. The method combines semantic-profile synthesis, hierarchical grouping with RQ-KMeans, and multi-faceted group priors (ID fusion, attribute completion, and group sequences) into a dual-channel predictor with asymmetric information injection and gated knowledge distillation. Empirical results on Taobao demonstrate offline GAUC gains for low-activity users and substantial online improvements in orders and GMV, validating the practical value of cross-user semantic transfer for sparse signals. The findings suggest that semantically grounded group knowledge can effectively bridge long-tail user gaps and improve real-world recommender performance.
Abstract
Accurately predicting conversion rates (CVR) for low-activity users remains a fundamental challenge in large-scale e-commerce recommender systems. Existing approaches face three critical limitations: (i) reliance on noisy and unreliable behavioral signals; (ii) insufficient user-level information due to the lack of diverse interaction data; and (iii) a systemic training bias toward high-activity users that overshadows the needs of low-activity users. To address these challenges, we propose ChoirRec, a novel framework that leverages the semantic capabilities of Large Language Models (LLMs) to construct semantic user groups and enhance CVR prediction for low-activity users. With a dual-channel architecture designed for robust cross-user knowledge transfer, ChoirRec comprises three components: (i) a Semantic Group Generation module that utilizes LLMs to form reliable, cross-activity user clusters, thereby filtering out noisy signals; (ii) a Group-aware Hierarchical Representation module that enriches sparse user embeddings with informative group-level priors to mitigate data insufficiency; and (iii) a Group-aware Multi-granularity Modual that employs a dual-channel architecture and adaptive fusion mechanism to ensure effective learning and utilization of group knowledge. We conduct extensive offline and online experiments on Taobao, a leading industrial-scale e-commerce platform. ChoirRec improves GAUC by 1.16\% in offline evaluations, while online A/B testing reveals a 7.24\% increase in order volume, highlighting its substantial practical value in real-world applications.
