HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
Junyi Chen, Lu Chi, Bingyue Peng, Zehuan Yuan
TL;DR
HLLM introduces a hierarchical two-LLM architecture to enhance sequential recommendations by converting rich item text into compact embeddings via an Item LLM and modeling user interests from those embeddings with a User LLM. The approach demonstrates that pre-trained LLM weights are valuable for recommendation, while task-specific fine-tuning remains essential, and it scales effectively to multi-billion-parameter configurations. Across PixelRec and Amazon Books, HLLM achieves state-of-the-art offline performance and confirms practical value through online A/B tests, with item embedding caching improving training and serving efficiency. The work highlights strong applicability for industrial deployments, balancing accuracy, scalability, and efficiency.
Abstract
Large Language Models (LLMs) have achieved remarkable success in various fields, prompting several studies to explore their potential in recommendation systems. However, these attempts have so far resulted in only modest improvements over traditional recommendation models. Moreover, three critical questions remain under-explored: firstly, the real value of LLMs' pre-trained weights, often considered to encapsulate world knowledge; secondly, the necessity of fine-tuning for recommendation tasks; lastly, whether LLMs can exhibit the same scalability benefits in recommendation systems as they do in other domains. In this paper, we propose a novel Hierarchical Large Language Model (HLLM) architecture designed to enhance sequential recommendation systems. Our approach employs a two-tier model: the first Item LLM extracts rich content features from the detailed text description of the item, while the second User LLM utilizes these features to predict users' future interests based on their interaction history. Extensive experiments demonstrate that our method effectively leverages the pre-trained capabilities of open-source LLMs, and further fine-tuning leads to significant performance boosts. Additionally, HLLM achieves excellent scalability, with the largest configuration utilizing 7B parameters for both item feature extraction and user interest modeling. Moreover, HLLM offers excellent training and serving efficiency, making it practical in real-world applications. Evaluations on two large-scale datasets, PixelRec and Amazon Reviews, show that HLLM achieves state-of-the-art results, outperforming traditional ID-based models by a wide margin. In online A/B testing, HLLM showcases notable gains, validating its practical impact in real-world recommendation scenarios. Codes are available at https://github.com/bytedance/HLLM.
