Data Augmentation for Sequential Recommendation: A Survey
Yizhou Dang, Enneng Yang, Yuting Liu, Guibing Guo, Linying Jiang, Jianzhe Zhao, Xingwei Wang
TL;DR
Data sparsity challenges in sequential recommendation motivate a comprehensive survey of data augmentation (DA) methods. The paper classifies DA into heuristic-based and model-based approaches, detailing data-level and representation-level techniques, and contrasting sequence extension, refining, generation, and LLM-based strategies. Empirical analysis across multiple datasets suggests model-based augmentation often outperforms heuristic methods, though trade-offs in cost and scalability remain; hybrid and scenario-specific strategies can yield strong gains. The work highlights gaps in theory, evaluation metrics for augmented data quality, and automated, generalizable DA methods, pointing to LLM-driven augmentation as a promising frontier.
Abstract
As an essential branch of recommender systems, sequential recommendation (SR) has received much attention due to its well-consistency with real-world situations. However, the widespread data sparsity issue limits the SR model's performance. Therefore, researchers have proposed many data augmentation (DA) methods to mitigate this phenomenon and have achieved impressive progress. In this survey, we provide a comprehensive review of DA methods for SR. We start by introducing the research background and motivation. Then, we categorize existing methodologies regarding their augmentation principles, objects, and purposes. Next, we present a comparative discussion of their advantages and disadvantages, followed by the exhibition and analysis of representative experimental results. Finally, we outline directions for future research and summarize this survey. We also maintain a repository with a paper list at \url{https://github.com/KingGugu/DA-CL-4Rec}.
