RecGPT: Generative Pre-training for Text-based Recommendation

Hoang Ngo; Dat Quoc Nguyen

RecGPT: Generative Pre-training for Text-based Recommendation

Hoang Ngo, Dat Quoc Nguyen

TL;DR

RecGPT introduces the first domain-adapted, fully trained large language models for text-based recommendation, comprising RecGPT-7B and its instruction-following variant RecGPT-7B-Instruct. Pre-trained on a 20.5B-token corpus and fine-tuned with 100K+ instructional prompts, RecGPT-7B-Instruct achieves state-of-the-art results on rating prediction and competitive performance on sequential recommendation across multiple benchmarks. The authors provide a transparent evaluation framework, address data leakage concerns, and publish both the models and their datasets to accelerate research and downstream applications. The work highlights the efficacy of continual domain-specific pre-training plus instruction-following fine-tuning in improving LLM-based recommendation systems and establishes a practical, scalable approach for text-based user modeling.

Abstract

We present the first domain-adapted and fully-trained large language model, RecGPT-7B, and its instruction-following variant, RecGPT-7B-Instruct, for text-based recommendation. Experimental results on rating prediction and sequential recommendation tasks show that our model, RecGPT-7B-Instruct, outperforms previous strong baselines. We are releasing our RecGPT models as well as their pre-training and fine-tuning datasets to facilitate future research and downstream applications in text-based recommendation. Public "huggingface" links to our RecGPT models and datasets are available at: https://github.com/VinAIResearch/RecGPT

RecGPT: Generative Pre-training for Text-based Recommendation

TL;DR

Abstract

RecGPT: Generative Pre-training for Text-based Recommendation

Authors

TL;DR

Abstract

Table of Contents