Table of Contents
Fetching ...

CHIME: A Compressive Framework for Holistic Interest Modeling

Yong Bai, Rui Xiang, Kaiyuan Li, Yongxiang Tang, Yanhua Cheng, Xialong Liu, Peng Jiang, Kun Gai

TL;DR

CHIME tackles scalable holistic long-term interest modeling by compressing full behavior sequences into compact histograms using an Interest Adaptation Module, an Interest Representation Module based on pretrained decoder-only LLMs, and an Interest Compression Module with residual vector quantization. It introduces holistic and immediate contrastive losses to align global and recent interests and demonstrates that pretrained LLM initialization improves performance as model depth grows. Experiments on MicroVideo, Tmall, and EBNeRD show consistent CTR/CVR gains and that CHIME can be integrated with existing ranking models and other long-term methods to reduce online computation. The work offers a practical, end-to-end, plug-and-play solution for industrial recommendation systems requiring scalable holistic user modeling.

Abstract

Modeling holistic user interests is important for improving recommendation systems but is challenged by high computational cost and difficulty in handling diverse information with full behavior context. Existing search-based methods might lose critical signals during behavior selection. To overcome these limitations, we propose CHIME: A Compressive Framework for Holistic Interest Modeling. It uses adapted large language models to encode complete user behaviors with heterogeneous inputs. We introduce multi-granular contrastive learning objectives to capture both persistent and transient interest patterns and apply residual vector quantization to generate compact embeddings. CHIME demonstrates superior ranking performance across diverse datasets, establishing a robust solution for scalable holistic interest modeling in recommendation systems.

CHIME: A Compressive Framework for Holistic Interest Modeling

TL;DR

CHIME tackles scalable holistic long-term interest modeling by compressing full behavior sequences into compact histograms using an Interest Adaptation Module, an Interest Representation Module based on pretrained decoder-only LLMs, and an Interest Compression Module with residual vector quantization. It introduces holistic and immediate contrastive losses to align global and recent interests and demonstrates that pretrained LLM initialization improves performance as model depth grows. Experiments on MicroVideo, Tmall, and EBNeRD show consistent CTR/CVR gains and that CHIME can be integrated with existing ranking models and other long-term methods to reduce online computation. The work offers a practical, end-to-end, plug-and-play solution for industrial recommendation systems requiring scalable holistic user modeling.

Abstract

Modeling holistic user interests is important for improving recommendation systems but is challenged by high computational cost and difficulty in handling diverse information with full behavior context. Existing search-based methods might lose critical signals during behavior selection. To overcome these limitations, we propose CHIME: A Compressive Framework for Holistic Interest Modeling. It uses adapted large language models to encode complete user behaviors with heterogeneous inputs. We introduce multi-granular contrastive learning objectives to capture both persistent and transient interest patterns and apply residual vector quantization to generate compact embeddings. CHIME demonstrates superior ranking performance across diverse datasets, establishing a robust solution for scalable holistic interest modeling in recommendation systems.

Paper Structure

This paper contains 17 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of search-based and compression-based frameworks: (1) Effectiveness: The search-based framework relies on simple top-k similarity retrieval, while the compression-based framework leverages richer representations incorporating all behaviors to better capture global interests; (2) Efficiency: The search-based framework requires per-target online retrieval, whereas the compression-based framework precomputes user representations offline at low frequencies, enabling efficient online serving.
  • Figure 2: Framework of the compression model, comprising three modules: (1) Interest Adaptation Module (IAM), integrating heterogeneous information; (2) Interest Representation Module (IRM), tuning a pretrained LLM backbone to capture long-term interests through holistic and immediate contrastive losses; and (3) Interest Compression Module (ICM), compressing interest distribution with residual quantization.
  • Figure 3: Illustration of holistic loss. Behaviors from other users are considered negative samples. Future behaviors are taken as positive or negative samples based on their labels.
  • Figure 4: Changes of the evaluation holistic loss values during training. For random initialization, deeper layers lead to more severe overfitting; whereas, for pretrained LLM initialization, deeper layers result in better performance. Overall, pretrained LLM initialization outperforms random initialization.
  • Figure 5: The t-SNE visualization of compressed interest representations reveals distinct clusters, where the colorbar indicates behavior length from purple (shorter) to yellow (longer).