RecGPT: Generative Pre-training for Text-based Recommendation
Hoang Ngo, Dat Quoc Nguyen
TL;DR
RecGPT introduces the first domain-adapted, fully trained large language models for text-based recommendation, comprising RecGPT-7B and its instruction-following variant RecGPT-7B-Instruct. Pre-trained on a 20.5B-token corpus and fine-tuned with 100K+ instructional prompts, RecGPT-7B-Instruct achieves state-of-the-art results on rating prediction and competitive performance on sequential recommendation across multiple benchmarks. The authors provide a transparent evaluation framework, address data leakage concerns, and publish both the models and their datasets to accelerate research and downstream applications. The work highlights the efficacy of continual domain-specific pre-training plus instruction-following fine-tuning in improving LLM-based recommendation systems and establishes a practical, scalable approach for text-based user modeling.
Abstract
We present the first domain-adapted and fully-trained large language model, RecGPT-7B, and its instruction-following variant, RecGPT-7B-Instruct, for text-based recommendation. Experimental results on rating prediction and sequential recommendation tasks show that our model, RecGPT-7B-Instruct, outperforms previous strong baselines. We are releasing our RecGPT models as well as their pre-training and fine-tuning datasets to facilitate future research and downstream applications in text-based recommendation. Public "huggingface" links to our RecGPT models and datasets are available at: https://github.com/VinAIResearch/RecGPT
