Table of Contents
Fetching ...

LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation

Yuqi Sun, Qidong Liu, Haiping Zhu, Feng Tian

TL;DR

LLMSeR tackles the long-tail user challenge in Sequential Recommender Systems by injecting pseudo-prior items generated with LLMs. It fuses collaborative signals via a Collaborative Candidate Generator and semantic cues via a Semantic Noise Filter (SIA), then assesses pseudo-item reliability with Adaptive Reliability Validation and combines augmented and original sequences through Dual-Channel Training with reliability-weighted losses. The approach uses a reverse-pretrained SRS to seed candidates, an LLM-guided reasoning process to validate pseudo-items, and a forward SRS with cosine-based similarity to filter hallucinations, resulting in robustness against noisy augmentations. Empirical results across three backbones and three datasets show consistent improvements, especially for tail users, demonstrating the method’s generality and practical impact for real-world SRS deployment.

Abstract

Sequential Recommender Systems (SRS) have become a cornerstone of online platforms, leveraging users' historical interaction data to forecast their next potential engagement. Despite their widespread adoption, SRS often grapple with the long-tail user dilemma, resulting in less effective recommendations for individuals with limited interaction records. The advent of Large Language Models (LLMs), with their profound capability to discern semantic relationships among items, has opened new avenues for enhancing SRS through data augmentation. Nonetheless, current methodologies encounter obstacles, including the absence of collaborative signals and the prevalence of hallucination phenomena. In this work, we present LLMSeR, an innovative framework that utilizes Large Language Models (LLMs) to generate pseudo-prior items, thereby improving the efficacy of Sequential Recommender Systems (SRS). To alleviate the challenge of insufficient collaborative signals, we introduce the Semantic Interaction Augmentor (SIA), a method that integrates both semantic and collaborative information to comprehensively augment user interaction data. Moreover, to weaken the adverse effects of hallucination in SRS, we develop the Adaptive Reliability Validation (ARV), a validation technique designed to assess the reliability of the generated pseudo items. Complementing these advancements, we also devise a Dual-Channel Training strategy, ensuring seamless integration of data augmentation into the SRS training process.Extensive experiments conducted with three widely-used SRS models demonstrate the generalizability and efficacy of LLMSeR.

LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation

TL;DR

LLMSeR tackles the long-tail user challenge in Sequential Recommender Systems by injecting pseudo-prior items generated with LLMs. It fuses collaborative signals via a Collaborative Candidate Generator and semantic cues via a Semantic Noise Filter (SIA), then assesses pseudo-item reliability with Adaptive Reliability Validation and combines augmented and original sequences through Dual-Channel Training with reliability-weighted losses. The approach uses a reverse-pretrained SRS to seed candidates, an LLM-guided reasoning process to validate pseudo-items, and a forward SRS with cosine-based similarity to filter hallucinations, resulting in robustness against noisy augmentations. Empirical results across three backbones and three datasets show consistent improvements, especially for tail users, demonstrating the method’s generality and practical impact for real-world SRS deployment.

Abstract

Sequential Recommender Systems (SRS) have become a cornerstone of online platforms, leveraging users' historical interaction data to forecast their next potential engagement. Despite their widespread adoption, SRS often grapple with the long-tail user dilemma, resulting in less effective recommendations for individuals with limited interaction records. The advent of Large Language Models (LLMs), with their profound capability to discern semantic relationships among items, has opened new avenues for enhancing SRS through data augmentation. Nonetheless, current methodologies encounter obstacles, including the absence of collaborative signals and the prevalence of hallucination phenomena. In this work, we present LLMSeR, an innovative framework that utilizes Large Language Models (LLMs) to generate pseudo-prior items, thereby improving the efficacy of Sequential Recommender Systems (SRS). To alleviate the challenge of insufficient collaborative signals, we introduce the Semantic Interaction Augmentor (SIA), a method that integrates both semantic and collaborative information to comprehensively augment user interaction data. Moreover, to weaken the adverse effects of hallucination in SRS, we develop the Adaptive Reliability Validation (ARV), a validation technique designed to assess the reliability of the generated pseudo items. Complementing these advancements, we also devise a Dual-Channel Training strategy, ensuring seamless integration of data augmentation into the SRS training process.Extensive experiments conducted with three widely-used SRS models demonstrate the generalizability and efficacy of LLMSeR.

Paper Structure

This paper contains 14 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 2: The architecture of proposed LLMSeR, illustrated with two users as an example.
  • Figure 3: Performance on the Book dataset with grouped users. "short" refers to users with interaction lengths in the range of $(0, 4)$, "medium" refers to users with interaction lengths in the range of $[4, 6)$, "long" refers to the range of $[6, \infty)$.
  • Figure 4: The results of experiments for the number of pseudo items $M$ for each user. All the results are conducted on Fashion dataset.
  • Figure 5: Experimental results of weight decay coefficient $\beta$ on the Fashion dataset for SASRec backbone models.