LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation
Yuqi Sun, Qidong Liu, Haiping Zhu, Feng Tian
TL;DR
LLMSeR tackles the long-tail user challenge in Sequential Recommender Systems by injecting pseudo-prior items generated with LLMs. It fuses collaborative signals via a Collaborative Candidate Generator and semantic cues via a Semantic Noise Filter (SIA), then assesses pseudo-item reliability with Adaptive Reliability Validation and combines augmented and original sequences through Dual-Channel Training with reliability-weighted losses. The approach uses a reverse-pretrained SRS to seed candidates, an LLM-guided reasoning process to validate pseudo-items, and a forward SRS with cosine-based similarity to filter hallucinations, resulting in robustness against noisy augmentations. Empirical results across three backbones and three datasets show consistent improvements, especially for tail users, demonstrating the method’s generality and practical impact for real-world SRS deployment.
Abstract
Sequential Recommender Systems (SRS) have become a cornerstone of online platforms, leveraging users' historical interaction data to forecast their next potential engagement. Despite their widespread adoption, SRS often grapple with the long-tail user dilemma, resulting in less effective recommendations for individuals with limited interaction records. The advent of Large Language Models (LLMs), with their profound capability to discern semantic relationships among items, has opened new avenues for enhancing SRS through data augmentation. Nonetheless, current methodologies encounter obstacles, including the absence of collaborative signals and the prevalence of hallucination phenomena. In this work, we present LLMSeR, an innovative framework that utilizes Large Language Models (LLMs) to generate pseudo-prior items, thereby improving the efficacy of Sequential Recommender Systems (SRS). To alleviate the challenge of insufficient collaborative signals, we introduce the Semantic Interaction Augmentor (SIA), a method that integrates both semantic and collaborative information to comprehensively augment user interaction data. Moreover, to weaken the adverse effects of hallucination in SRS, we develop the Adaptive Reliability Validation (ARV), a validation technique designed to assess the reliability of the generated pseudo items. Complementing these advancements, we also devise a Dual-Channel Training strategy, ensuring seamless integration of data augmentation into the SRS training process.Extensive experiments conducted with three widely-used SRS models demonstrate the generalizability and efficacy of LLMSeR.
