LMMRec: LLM-driven Motivation-aware Multimodal Recommendation

Yicheng Di; Zhanjie Zhang; Yun Wangc; Jinren Liue; Jiaqi Yanf; Jiyu Wei; Xiangyu Chend; Yuan Liu

LMMRec: LLM-driven Motivation-aware Multimodal Recommendation

Yicheng Di, Zhanjie Zhang, Yun Wangc, Jinren Liue, Jiaqi Yanf, Jiyu Wei, Xiangyu Chend, Yuan Liu

TL;DR

This work tackles motivation-based multimodal recommendation by integrating large language models to extract fine-grained textual motivations and coupling them with interaction signals through a dual-encoder architecture. It introduces two core strategies—Motivation Coordination Strategy and Interaction-Text Correspondence Method—to promote stable cross-modal alignment and robustness against semantic noise, utilizing contrastive learning and momentum-based teacher-student learning. Empirical results on Yelp, Amazon-book, and Steam demonstrate consistent improvements over model-agnostic baselines, with notable gains in Recall and NDCG and strong noise-robustness performance. The approach yields more interpretable and semantically grounded recommendations, with potential for future causal motivation modeling and adaptive fusion in open-domain settings.

Abstract

Motivation-based recommendation systems uncover user behavior drivers. Motivation modeling, crucial for decision-making and content preference, explains recommendation generation. Existing methods often treat motivation as latent variables from interaction data, neglecting heterogeneous information like review text. In multimodal motivation fusion, two challenges arise: 1) achieving stable cross-modal alignment amid noise, and 2) identifying features reflecting the same underlying motivation across modalities. To address these, we propose LLM-driven Motivation-aware Multimodal Recommendation (LMMRec), a model-agnostic framework leveraging large language models for deep semantic priors and motivation understanding. LMMRec uses chain-of-thought prompting to extract fine-grained user and item motivations from text. A dual-encoder architecture models textual and interaction-based motivations for cross-modal alignment, while Motivation Coordination Strategy and Interaction-Text Correspondence Method mitigate noise and semantic drift through contrastive learning and momentum updates. Experiments on three datasets show LMMRec achieves up to a 4.98\% performance improvement.

LMMRec: LLM-driven Motivation-aware Multimodal Recommendation

TL;DR

Abstract

LMMRec: LLM-driven Motivation-aware Multimodal Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)