Table of Contents
Fetching ...

Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"

Ranjan Mishra, Julian I. Bibo, Quinten van Engelen, Henk Schaapman

TL;DR

This work performs a rigorous reproducibility study of XRec, evaluating its explainable recommendation capabilities using Llama 3 instead of GPT-3.5-turbo and extending the analysis with variations to the Mixture of Experts embeddings. It preserves the original architecture that unifies collaborative filtering with a frozen LLM via a MoE adapter, injecting adapted GNN embeddings into prompt tokens to generate explanations, and evaluates across three datasets with multiple explainability metrics. The findings show that XRec can produce personalized explanations and gains stability when incorporating collaborative signals, but does not consistently outperform all baselines across all metrics; the MoE embeddings chiefly influence explanation structure, while GNN output embeddings have limited impact on overall performance. The study provides open-source code for evaluation and discusses the practical implications, limitations, and directions for future work in explainable recommendations with LLMs.

Abstract

In this study, we reproduced the work done in the paper "XRec: Large Language Models for Explainable Recommendation" by Ma et al. (2024). The original authors introduced XRec, a model-agnostic collaborative instruction-tuning framework that enables large language models (LLMs) to provide users with comprehensive explanations of generated recommendations. Our objective was to replicate the results of the original paper, albeit using Llama 3 as the LLM for evaluation instead of GPT-3.5-turbo. We built on the source code provided by Ma et al. (2024) to achieve our goal. Our work extends the original paper by modifying the input embeddings or deleting the output embeddings of XRec's Mixture of Experts module. Based on our results, XRec effectively generates personalized explanations and its stability is improved by incorporating collaborative information. However, XRec did not consistently outperform all baseline models in every metric. Our extended analysis further highlights the importance of the Mixture of Experts embeddings in shaping the explanation structures, showcasing how collaborative signals interact with language modeling. Through our work, we provide an open-source evaluation implementation that enhances accessibility for researchers and practitioners alike. Our complete code repository can be found at https://github.com/julianbibo/xrec-reproducibility.

Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"

TL;DR

This work performs a rigorous reproducibility study of XRec, evaluating its explainable recommendation capabilities using Llama 3 instead of GPT-3.5-turbo and extending the analysis with variations to the Mixture of Experts embeddings. It preserves the original architecture that unifies collaborative filtering with a frozen LLM via a MoE adapter, injecting adapted GNN embeddings into prompt tokens to generate explanations, and evaluates across three datasets with multiple explainability metrics. The findings show that XRec can produce personalized explanations and gains stability when incorporating collaborative signals, but does not consistently outperform all baselines across all metrics; the MoE embeddings chiefly influence explanation structure, while GNN output embeddings have limited impact on overall performance. The study provides open-source code for evaluation and discusses the practical implications, limitations, and directions for future work in explainable recommendations with LLMs.

Abstract

In this study, we reproduced the work done in the paper "XRec: Large Language Models for Explainable Recommendation" by Ma et al. (2024). The original authors introduced XRec, a model-agnostic collaborative instruction-tuning framework that enables large language models (LLMs) to provide users with comprehensive explanations of generated recommendations. Our objective was to replicate the results of the original paper, albeit using Llama 3 as the LLM for evaluation instead of GPT-3.5-turbo. We built on the source code provided by Ma et al. (2024) to achieve our goal. Our work extends the original paper by modifying the input embeddings or deleting the output embeddings of XRec's Mixture of Experts module. Based on our results, XRec effectively generates personalized explanations and its stability is improved by incorporating collaborative information. However, XRec did not consistently outperform all baseline models in every metric. Our extended analysis further highlights the importance of the Mixture of Experts embeddings in shaping the explanation structures, showcasing how collaborative signals interact with language modeling. Through our work, we provide an open-source evaluation implementation that enhances accessibility for researchers and practitioners alike. Our complete code repository can be found at https://github.com/julianbibo/xrec-reproducibility.

Paper Structure

This paper contains 29 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The overall architecture of XRec as shown in the original paper ma_xrec_2024.
  • Figure 3: Train loss during training of full XRec model (orange) and model that used fixed embeddings MoE input embeddings (gray). Training is done on the Amazon-books dataset.
  • Figure : (a) A copy of Figure 3 from ma_xrec_2024, showing the results of their ablation study.
  • Figure : (a) A copy of Figure 3 from ma_xrec_2024, showing the results of their ablation study.
  • Figure : (b) The results of our reproducibility study. Note that the y-axis here is 0-based, as opposed to (a).