Table of Contents
Fetching ...

Multimodal Enhancement of Sequential Recommendation

Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield

TL;DR

MuSTRec tackles interaction scarcity by unifying multimodal item representations with sequential user dynamics through item-item and user-item graphs and a frequency-aware transformer. The approach uses modality-specific item-item graphs frozen during training alongside a LightGCN-like backbone to propagate collaborative signals, fed into a transformer head that preserves short-term dynamics via Fourier-based filtering. Experiments on Amazon datasets show robust improvements (up to 33.5% gains) and reveal the benefits of a careful data-partitioning regime; injecting user embeddings can yield substantial gains on small datasets, especially with appropriate weighting. The work provides a publicly available implementation and points to a promising direction for scalable, cross-modal, context-aware recommendation systems.

Abstract

We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on smaller datasets. Our code is availabe at https://anonymous.4open.science/r/MuSTRec-D32B/ and will be made publicly available.

Multimodal Enhancement of Sequential Recommendation

TL;DR

MuSTRec tackles interaction scarcity by unifying multimodal item representations with sequential user dynamics through item-item and user-item graphs and a frequency-aware transformer. The approach uses modality-specific item-item graphs frozen during training alongside a LightGCN-like backbone to propagate collaborative signals, fed into a transformer head that preserves short-term dynamics via Fourier-based filtering. Experiments on Amazon datasets show robust improvements (up to 33.5% gains) and reveal the benefits of a careful data-partitioning regime; injecting user embeddings can yield substantial gains on small datasets, especially with appropriate weighting. The work provides a publicly available implementation and points to a promising direction for scalable, cross-modal, context-aware recommendation systems.

Abstract

We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on smaller datasets. Our code is availabe at https://anonymous.4open.science/r/MuSTRec-D32B/ and will be made publicly available.
Paper Structure (23 sections, 20 equations, 7 figures, 9 tables)

This paper contains 23 sections, 20 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: The MuSTRec Architecture
  • Figure 2: $Clothing$ Ablation
  • Figure 3: Sensitivity analysis results on $\omega$ for the indicated datasets.
  • Figure 4: $Baby$ Sensitivity
  • Figure 5: $Baby$ Ablation
  • ...and 2 more figures