Table of Contents
Fetching ...

QARM V2: Quantitative Alignment Multi-Modal Recommendation for Reasoning User Sequence Modeling

Tian Xia, Jiaqi Zhang, Yueyang Liu, Hongjian Dou, Tingya Yin, Jiangxia Cao, Xulei Liang, Tianlu Xie, Lihao Liu, Xiang Chen, Shen Wang, Changxin Lao, Haixiang Gan, Jinkai Yu, Keting Cen, Lu Hao, Xu Zhang, Qiqiang Zhong, Zhongbo Sun, Yiyu Wang, Shuang Yang, Mingxin Wen, Xiangyu Wu, Shaoguo Liu, Tingting Gao, Zhaojie Liu, Han Li, Kun Gai

TL;DR

QARM V2 tackles the limitations of ID-based industrial RecSys by aligning LLM-derived semantic signals with business objectives for lifelong user sequence modeling. It introduces reasoning-based item alignment and a Res-KmeansFSQ hybrid quantization to generate both embeddings for GSU and multi-level Semantic IDs for ESU, enabling end-to-end optimization. Empirical results across multiple industrial domains show consistent offline improvements in AUC/GAUC and notable online gains in revenue, GMV, and engagement, validating the approach's practical impact. The work demonstrates that business-aware LLM representations can significantly enhance both retrieval and ranking in real-world, long-tail, multi-modal recsys ecosystems.

Abstract

With the evolution of large language models (LLMs), there is growing interest in leveraging their rich semantic understanding to enhance industrial recommendation systems (RecSys). Traditional RecSys relies on ID-based embeddings for user sequence modeling in the General Search Unit (GSU) and Exact Search Unit (ESU) paradigm, which suffers from low information density, knowledge isolation, and weak generalization ability. While LLMs offer complementary strengths with dense semantic representations and strong generalization, directly applying LLM embeddings to RecSys faces critical challenges: representation unmatch with business objectives and representation unlearning end-to-end with downstream tasks. In this paper, we present QARM V2, a unified framework that bridges LLM semantic understanding with RecSys business requirements for user sequence modeling.

QARM V2: Quantitative Alignment Multi-Modal Recommendation for Reasoning User Sequence Modeling

TL;DR

QARM V2 tackles the limitations of ID-based industrial RecSys by aligning LLM-derived semantic signals with business objectives for lifelong user sequence modeling. It introduces reasoning-based item alignment and a Res-KmeansFSQ hybrid quantization to generate both embeddings for GSU and multi-level Semantic IDs for ESU, enabling end-to-end optimization. Empirical results across multiple industrial domains show consistent offline improvements in AUC/GAUC and notable online gains in revenue, GMV, and engagement, validating the approach's practical impact. The work demonstrates that business-aware LLM representations can significantly enhance both retrieval and ranking in real-world, long-tail, multi-modal recsys ecosystems.

Abstract

With the evolution of large language models (LLMs), there is growing interest in leveraging their rich semantic understanding to enhance industrial recommendation systems (RecSys). Traditional RecSys relies on ID-based embeddings for user sequence modeling in the General Search Unit (GSU) and Exact Search Unit (ESU) paradigm, which suffers from low information density, knowledge isolation, and weak generalization ability. While LLMs offer complementary strengths with dense semantic representations and strong generalization, directly applying LLM embeddings to RecSys faces critical challenges: representation unmatch with business objectives and representation unlearning end-to-end with downstream tasks. In this paper, we present QARM V2, a unified framework that bridges LLM semantic understanding with RecSys business requirements for user sequence modeling.
Paper Structure (16 sections, 4 equations, 5 figures, 10 tables)

This paper contains 16 sections, 4 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: ID-based RecSys Model and LLM Differences.
  • Figure 2: LLM understanding reasoning ability is necessary to filter the noise item pairs behind RecSys model: find the unrelated hot-popular biased item pairs from Item2Item model, and identify the different but relevant item pairs from User2Item model.
  • Figure 3: The K-means algorithm is a distribution-dependent quantification method that is highly affected by the training data; the distribution-edge points will be pulled closer to the distribution-dense area. FSQ is a distribution-independent quantification method that keeps each data point's position.
  • Figure 4: For the data pipeline, we first utilize the latest LLM to identify the item pairs relevance via their title, and then generating the corresponding question-answering pair based on each item's visual/text data. In LLM fine-tuning, we modify the attention mask to split the input token sequence as three segment, and then conduct contrastive and generation task to fine-tuning the LLM backbone to obtain the item semantic embedding with appropriate business knowledge.
  • Figure 5: Visualization of exclusive retrieved items.