Data Imputation using Large Language Model to Accelerate Recommendation System
Zhicheng Ding, Jiahao Tian, Zhenkai Wang, Jinman Zhao, Siyang Li
TL;DR
This work tackles data sparsity in recommender systems by employing a fine-tuned Large Language Model (LLM) to impute missing values, enriching the tabular data used for recommendations. The method uses LoRA-based efficient fine-tuning on complete data, then prompts the LLM to fill in missing attributes, producing a merged dataset fed into a downstream recommender (DLRM). Experiments on AdClick and MovieLens show that LLM-based imputation is competitive, and often superior, especially in multi-class classification and regression tasks, compared with traditional imputation baselines. The results suggest that LLMs can meaningfully address data sparsity in big-data contexts, improving personalization and system robustness in recommender applications.
Abstract
This paper aims to address the challenge of sparse and missing data in recommendation systems, a significant hurdle in the age of big data. Traditional imputation methods struggle to capture complex relationships within the data. We propose a novel approach that fine-tune Large Language Model (LLM) and use it impute missing data for recommendation systems. LLM which is trained on vast amounts of text, is able to understand complex relationship among data and intelligently fill in missing information. This enriched data is then used by the recommendation system to generate more accurate and personalized suggestions, ultimately enhancing the user experience. We evaluate our LLM-based imputation method across various tasks within the recommendation system domain, including single classification, multi-classification, and regression compared to traditional data imputation methods. By demonstrating the superiority of LLM imputation over traditional methods, we establish its potential for improving recommendation system performance.
