Attention Calibration for Transformer-based Sequential Recommendation

Peilin Zhou; Qichen Ye; Yueqi Xie; Jingqi Gao; Shoujin Wang; Jae Boum Kim; Chenyu You; Sunghun Kim

Attention Calibration for Transformer-based Sequential Recommendation

Peilin Zhou, Qichen Ye, Yueqi Xie, Jingqi Gao, Shoujin Wang, Jae Boum Kim, Chenyu You, Sunghun Kim

TL;DR

This work probes why self-attention in transformer-based sequential recommenders can misfocus on irrelevant history items and attributes this to sub-optimal position encoding and noisy input. It introduces AC-TSR, a plug-in framework featuring a Spatial Calibrator that injects direct spatial cues and an Adversarial Calibrator that reweights attention based on item contribution, yielding calibrated attention and improved next-item predictions. Extensive experiments on Yelp and four Amazon categories demonstrate consistent performance gains over state-of-the-art transformer SRs, with a lightweight variant that preserves backbone speed. The approach offers a practical path to more robust, interpretable transformer-based recommendations and improves alignment between attention and actual item importance.

Abstract

Transformer-based sequential recommendation (SR) has been booming in recent years, with the self-attention mechanism as its key component. Self-attention has been widely believed to be able to effectively select those informative and relevant items from a sequence of interacted items for next-item prediction via learning larger attention weights for these items. However, this may not always be true in reality. Our empirical analysis of some representative Transformer-based SR models reveals that it is not uncommon for large attention weights to be assigned to less relevant items, which can result in inaccurate recommendations. Through further in-depth analysis, we find two factors that may contribute to such inaccurate assignment of attention weights: sub-optimal position encoding and noisy input. To this end, in this paper, we aim to address this significant yet challenging gap in existing works. To be specific, we propose a simple yet effective framework called Attention Calibration for Transformer-based Sequential Recommendation (AC-TSR). In AC-TSR, a novel spatial calibrator and adversarial calibrator are designed respectively to directly calibrates those incorrectly assigned attention weights. The former is devised to explicitly capture the spatial relationships (i.e., order and distance) among items for more precise calculation of attention weights. The latter aims to redistribute the attention weights based on each item's contribution to the next-item prediction. AC-TSR is readily adaptable and can be seamlessly integrated into various existing transformer-based SR models. Extensive experimental results on four benchmark real-world datasets demonstrate the superiority of our proposed ACTSR via significant recommendation performance enhancements. The source code is available at https://github.com/AIM-SE/AC-TSR.

Attention Calibration for Transformer-based Sequential Recommendation

TL;DR

Abstract

Paper Structure (32 sections, 22 equations, 6 figures, 4 tables)

This paper contains 32 sections, 22 equations, 6 figures, 4 tables.

Introduction
Related Work
Sequential Recommendation
Debates on Attention Mechanism
Preliminary
Problem Setup
Transformer-based Recommenders
Embedding Layer
Transformer Block
Learning Objective
AC-TSR Framework
Overall Architecture
Spatial Calibrator
Adversarial Calibrator
Perturbation Module.
...and 17 more sections

Figures (6)

Figure 1: (a) Removing the highest attention weight from transformer-based SRS does not lead to a significant decrease in model performance and even improves performance in some cases; (b) Visualization of the attention weights from SASRec and our proposed AC-TSR.
Figure 2: Overview of the proposed AC-TSR framework. The SASRec model functions as the backbone, where its self-attention layer is converted into an Attention Calibration (AC) layer for improved performance. Each AC layer contains a spatial calibrator (purple dotted box) and an adversarial calibrator (green dotted box). The spatial calibrator is responsible for incorporating spatial information such as order and distance into the attention weights. The adversarial calibrator aims to identify decisive items and adjust the distribution of attention weights.
Figure 3: Impact of different aggregation strategies in Correction Module.
Figure 4: Effect of balance parameter $\alpha$.
Figure 5: Comparison of the mean Kendall-$\tau$ correlation between attention weights and gradient importance measures. The results verify that our AC method can improve Kendall-$\tau$ correlation by a large margin.
...and 1 more figures

Attention Calibration for Transformer-based Sequential Recommendation

TL;DR

Abstract

Attention Calibration for Transformer-based Sequential Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)