Table of Contents
Fetching ...

Tag-Enriched Multi-Attention with Large Language Models for Cross-Domain Sequential Recommendation

Wangyu Wu, Xuhang Chen, Zhenhong Chen, Jing-En Jiang, Kim-Fung Tsang, Xiaowei Huang, Fei Ma, Jimin Xiao

TL;DR

Cross-domain sequential recommendation faces data sparsity and cross-domain misalignment, limiting multimodal utilization. The authors present TEMA-LLM, a framework that uses LLMs to generate domain-aware semantic tags from item titles/descriptions and fuses tag embeddings with ID, textual, and visual features through a Tag-Enriched Multi-Attention architecture. The approach jointly models intra- and inter-domain user preferences via a four-stage pipeline, including self-refined tag lists, tag embedding, multimodal item construction, and hierarchical attention, achieving state-of-the-art results on four large-scale CDSR datasets. Ablation studies confirm the contributions of LLM-driven tagging, weighted tag fusion, and the multi-attention design, with offline tag generation and CLIP feature extraction enhancing practicality. Overall, the work demonstrates the practical potential of LLM-based semantic tagging to improve cross-domain alignment and personalized, multimodal recommendations in consumer-facing platforms.

Abstract

Cross-Domain Sequential Recommendation (CDSR) plays a crucial role in modern consumer electronics and e-commerce platforms, where users interact with diverse services such as books, movies, and online retail products. These systems must accurately capture both domain-specific and cross-domain behavioral patterns to provide personalized and seamless consumer experiences. To address this challenge, we propose \textbf{TEMA-LLM} (\textit{Tag-Enriched Multi-Attention with Large Language Models}), a practical and effective framework that integrates \textit{Large Language Models (LLMs)} for semantic tag generation and enrichment. Specifically, TEMA-LLM employs LLMs to assign domain-aware prompts and generate descriptive tags from item titles and descriptions. The resulting tag embeddings are fused with item identifiers as well as textual and visual features to construct enhanced item representations. A \textit{Tag-Enriched Multi-Attention} mechanism is then introduced to jointly model user preferences within and across domains, enabling the system to capture complex and evolving consumer interests. Extensive experiments on four large-scale e-commerce datasets demonstrate that TEMA-LLM consistently outperforms state-of-the-art baselines, underscoring the benefits of LLM-based semantic tagging and multi-attention integration for consumer-facing recommendation systems. The proposed approach highlights the potential of LLMs to advance intelligent, user-centric services in the field of consumer electronics.

Tag-Enriched Multi-Attention with Large Language Models for Cross-Domain Sequential Recommendation

TL;DR

Cross-domain sequential recommendation faces data sparsity and cross-domain misalignment, limiting multimodal utilization. The authors present TEMA-LLM, a framework that uses LLMs to generate domain-aware semantic tags from item titles/descriptions and fuses tag embeddings with ID, textual, and visual features through a Tag-Enriched Multi-Attention architecture. The approach jointly models intra- and inter-domain user preferences via a four-stage pipeline, including self-refined tag lists, tag embedding, multimodal item construction, and hierarchical attention, achieving state-of-the-art results on four large-scale CDSR datasets. Ablation studies confirm the contributions of LLM-driven tagging, weighted tag fusion, and the multi-attention design, with offline tag generation and CLIP feature extraction enhancing practicality. Overall, the work demonstrates the practical potential of LLM-based semantic tagging to improve cross-domain alignment and personalized, multimodal recommendations in consumer-facing platforms.

Abstract

Cross-Domain Sequential Recommendation (CDSR) plays a crucial role in modern consumer electronics and e-commerce platforms, where users interact with diverse services such as books, movies, and online retail products. These systems must accurately capture both domain-specific and cross-domain behavioral patterns to provide personalized and seamless consumer experiences. To address this challenge, we propose \textbf{TEMA-LLM} (\textit{Tag-Enriched Multi-Attention with Large Language Models}), a practical and effective framework that integrates \textit{Large Language Models (LLMs)} for semantic tag generation and enrichment. Specifically, TEMA-LLM employs LLMs to assign domain-aware prompts and generate descriptive tags from item titles and descriptions. The resulting tag embeddings are fused with item identifiers as well as textual and visual features to construct enhanced item representations. A \textit{Tag-Enriched Multi-Attention} mechanism is then introduced to jointly model user preferences within and across domains, enabling the system to capture complex and evolving consumer interests. Extensive experiments on four large-scale e-commerce datasets demonstrate that TEMA-LLM consistently outperforms state-of-the-art baselines, underscoring the benefits of LLM-based semantic tagging and multi-attention integration for consumer-facing recommendation systems. The proposed approach highlights the potential of LLMs to advance intelligent, user-centric services in the field of consumer electronics.

Paper Structure

This paper contains 21 sections, 21 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) Traditional CDSR models primarily rely on item basic features. (b) Our proposed TEMA-LLM enriches item representations by generating semantic tags from LLMs and integrating them with textual and visual features.
  • Figure 2: Overview of the proposed TEMA-LLM framework. The Feature Preparation module generates multimodal item embeddings using a learnable ID matrix, a frozen CLIP image encoder, and a CLIP text encoder applied to LLM-augmented text. Enriched tag embeddings are fused into the item representation via weighted multi-hot encoding. These representations are processed by multi-head attention layers to model intra- and inter-sequence user preferences. Finally, cosine similarity with candidate embeddings is used for next-item prediction.
  • Figure 3: Overview of the prompt-based enhancement pipeline in our framework. Given a movie item, we first generate prompts that guide a Large Language Model (LLM) to produce semantic tag information. Subsequently, the same LLM is used to compute matching scores between the item and each tag, enabling the selection of relevant tags for embedding generation. This two-stage LLM-based process enriches item representations with structured semantic knowledge for downstream recommendation.