Table of Contents
Fetching ...

LLM4Rec: Large Language Models for Multimodal Generative Recommendation with Causal Debiasing

Bo Ma, Hang Li, ZeHua Hu, XiaoFan Gui, LuYao Liu, Simon Lau

TL;DR

LLM4Rec tackles the core challenges of multimodal data handling, bias mitigation, transparency, and adaptability in generative recommendations by integrating five innovations around a large language model backbone. The approach combines a multimodal fusion architecture, retrieval-augmented generation, causal-inference–based debiasing, explainable generation, and real-time adaptive learning to produce personalized recommendations with justifications. Empirical results on MovieLens-25M, Amazon-Electronics, and Yelp-2023 show consistent gains in accuracy, fairness, and diversity, including up to 2.3% improvement in NDCG@10 and enhanced diversity, while maintaining efficiency through optimized inference. The work advances practical, scalable, and trustworthy recommendation systems by enabling continuous learning and transparent decision-making in multimodal contexts.

Abstract

Contemporary generative recommendation systems face significant challenges in handling multimodal data, eliminating algorithmic biases, and providing transparent decision-making processes. This paper introduces an enhanced generative recommendation framework that addresses these limitations through five key innovations: multimodal fusion architecture, retrieval-augmented generation mechanisms, causal inference-based debiasing, explainable recommendation generation, and real-time adaptive learning capabilities. Our framework leverages advanced large language models as the backbone while incorporating specialized modules for cross-modal understanding, contextual knowledge integration, bias mitigation, explanation synthesis, and continuous model adaptation. Extensive experiments on three benchmark datasets (MovieLens-25M, Amazon-Electronics, Yelp-2023) demonstrate consistent improvements in recommendation accuracy, fairness, and diversity compared to existing approaches. The proposed framework achieves up to 2.3% improvement in NDCG@10 and 1.4% enhancement in diversity metrics while maintaining computational efficiency through optimized inference strategies.

LLM4Rec: Large Language Models for Multimodal Generative Recommendation with Causal Debiasing

TL;DR

LLM4Rec tackles the core challenges of multimodal data handling, bias mitigation, transparency, and adaptability in generative recommendations by integrating five innovations around a large language model backbone. The approach combines a multimodal fusion architecture, retrieval-augmented generation, causal-inference–based debiasing, explainable generation, and real-time adaptive learning to produce personalized recommendations with justifications. Empirical results on MovieLens-25M, Amazon-Electronics, and Yelp-2023 show consistent gains in accuracy, fairness, and diversity, including up to 2.3% improvement in NDCG@10 and enhanced diversity, while maintaining efficiency through optimized inference. The work advances practical, scalable, and trustworthy recommendation systems by enabling continuous learning and transparent decision-making in multimodal contexts.

Abstract

Contemporary generative recommendation systems face significant challenges in handling multimodal data, eliminating algorithmic biases, and providing transparent decision-making processes. This paper introduces an enhanced generative recommendation framework that addresses these limitations through five key innovations: multimodal fusion architecture, retrieval-augmented generation mechanisms, causal inference-based debiasing, explainable recommendation generation, and real-time adaptive learning capabilities. Our framework leverages advanced large language models as the backbone while incorporating specialized modules for cross-modal understanding, contextual knowledge integration, bias mitigation, explanation synthesis, and continuous model adaptation. Extensive experiments on three benchmark datasets (MovieLens-25M, Amazon-Electronics, Yelp-2023) demonstrate consistent improvements in recommendation accuracy, fairness, and diversity compared to existing approaches. The proposed framework achieves up to 2.3% improvement in NDCG@10 and 1.4% enhancement in diversity metrics while maintaining computational efficiency through optimized inference strategies.

Paper Structure

This paper contains 18 sections, 33 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Enhanced GenRec Framework Architecture: A comprehensive system integrating five key innovations - ① Multimodal Fusion with cross-modal attention, ② Retrieval-Augmented Generation, ③ Causal Inference-based Debiasing, ④ Explainable Recommendation Generation, and ⑤ Real-time Adaptive Learning. The framework processes heterogeneous inputs through specialized encoders and generates personalized recommendations with natural language explanations.
  • Figure 2: Detailed multimodal fusion architecture with cross-modal attention mechanisms. The system processes textual content (reviews, descriptions), categorical features (genres, categories), and numerical signals (ratings, timestamps) through specialized encoders, applies pairwise cross-modal attention, and generates a unified representation through adaptive weighted fusion with residual connections.