Table of Contents
Fetching ...

HatLLM: Hierarchical Attention Masking for Enhanced Collaborative Modeling in LLM-based Recommendation

Yu Cui, Feng Liu, Jiawei Chen, Canghong Jin, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Can Wang

TL;DR

HatLLM tackles the limitation of token-focused LLMs in capturing collaborative signals for sequential recommendation by introducing a hierarchical attention masking scheme. The method assigns distinct attention patterns across layers: IN in shallow layers for intra-item semantic learning, OR in middle layers for token-level language modeling, and CR in deep layers for cross-item collaboration. This lightweight, end-to-end approach yields an average 9.13% improvement over state-of-the-art LLM-based methods across three real-world datasets, validating its effectiveness in jointly modeling token- and item-level dependencies. HatLLM thus provides a plug-in enhancement to LLM-based recommender systems, enabling more robust and scalable modeling of cross-item relationships without significant overhead.

Abstract

Recent years have witnessed a surge of research on leveraging large language models (LLMs) for sequential recommendation. LLMs have demonstrated remarkable potential in inferring users' nuanced preferences through fine-grained semantic reasoning. However, they also exhibit a notable limitation in effectively modeling collaborative signals, i.e., behavioral correlations inherent in users' historical interactions. Our empirical analysis further reveals that the attention mechanisms in LLMs tend to disproportionately focus on tokens within the same item, thereby impeding the capture of cross-item correlations. To address this limitation, we propose a novel hierarchical attention masking strategy for LLM-based recommendation, termed HatLLM. Specifically, in shallow layers, HatLLM masks attention between tokens from different items, facilitating intra-item semantic understanding; in contrast, in deep layers, HatLLM masks attention within items, thereby compelling the model to capture cross-item correlations. This progressive, layer-wise approach enables LLMs to jointly model both token-level and item-level dependencies. Extensive experiments on three real-world datasets demonstrate that HatLLM achieves significant performance gains (9.13% on average) over existing LLM-based methods.

HatLLM: Hierarchical Attention Masking for Enhanced Collaborative Modeling in LLM-based Recommendation

TL;DR

HatLLM tackles the limitation of token-focused LLMs in capturing collaborative signals for sequential recommendation by introducing a hierarchical attention masking scheme. The method assigns distinct attention patterns across layers: IN in shallow layers for intra-item semantic learning, OR in middle layers for token-level language modeling, and CR in deep layers for cross-item collaboration. This lightweight, end-to-end approach yields an average 9.13% improvement over state-of-the-art LLM-based methods across three real-world datasets, validating its effectiveness in jointly modeling token- and item-level dependencies. HatLLM thus provides a plug-in enhancement to LLM-based recommender systems, enabling more robust and scalable modeling of cross-item relationships without significant overhead.

Abstract

Recent years have witnessed a surge of research on leveraging large language models (LLMs) for sequential recommendation. LLMs have demonstrated remarkable potential in inferring users' nuanced preferences through fine-grained semantic reasoning. However, they also exhibit a notable limitation in effectively modeling collaborative signals, i.e., behavioral correlations inherent in users' historical interactions. Our empirical analysis further reveals that the attention mechanisms in LLMs tend to disproportionately focus on tokens within the same item, thereby impeding the capture of cross-item correlations. To address this limitation, we propose a novel hierarchical attention masking strategy for LLM-based recommendation, termed HatLLM. Specifically, in shallow layers, HatLLM masks attention between tokens from different items, facilitating intra-item semantic understanding; in contrast, in deep layers, HatLLM masks attention within items, thereby compelling the model to capture cross-item correlations. This progressive, layer-wise approach enables LLMs to jointly model both token-level and item-level dependencies. Extensive experiments on three real-world datasets demonstrate that HatLLM achieves significant performance gains (9.13% on average) over existing LLM-based methods.

Paper Structure

This paper contains 24 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of LLM-based recommendation methods and their limitations in capturing cross-item correlations.
  • Figure 2: The total attention proportion and average attention weight comparison on intra-item and inter-item attentions.
  • Figure 3: The overall framework of proposed HatLLM. It contains: 1) Intra-item Attention (IN) in shallow layers for learning individual item semantics; 2) Original Attention (OR) in middle layers for token-level language modeling; 3) Cross-item Attention (CR) in deep layers for capturing collaborative signals of cross-item correlations.
  • Figure 4: Illustration of the limitation of directly masking intra-item token attention with $M^{CR-pre}_{jk}$. Here we give an example of the token attention aggregation of the item "The Winter Soldier": attention easily focus on nearby items.
  • Figure 5: Hyperparameter sensitivity analysis on shallow and deep layer numbers.