Table of Contents
Fetching ...

The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation

Jiaheng Zhang, Daqiang Zhang

TL;DR

The paper addresses the trade-off between recommendation accuracy and explanation quality by introducing Prism, a fully decoupled framework that uses a faithfulness-constrained knowledge-distillation pipeline to train a compact explanation generator from a large teacher LLM. By separating ranking from explanation generation, Prism achieves strong human-perceived faithfulness and personalization while delivering substantial efficiency gains (over 24x faster, ~10x less memory) and plug-and-play compatibility with any ranking module. It demonstrates that a compact student can surpass its large teacher on human judgments and even mitigate teacher hallucinations, revealing a knowledge refinement effect from distillation. The work highlights the limitations of automatic metrics like ROUGE-L for this task and provides a practical, trustworthy pathway for deploying explainable recommendations in real-world systems.

Abstract

The integration of Large Language Models (LLMs) into explainable recommendation systems often leads to a performance-efficiency trade-off in end-to-end architectures, where joint optimization of ranking and explanation can result in suboptimal compromises. To resolve this, we propose Prism, a novel decoupled framework that rigorously separates the recommendation process into a dedicated ranking stage and an explanation generation stage. This decomposition ensures that each component is optimized for its specific objective, eliminating inherent conflicts in coupled models. Inspired by knowledge distillation, Prism leverages a powerful, instruction-following teacher LLM (FLAN-T5-XXL) as an Oracle to produce high-fidelity explanatory knowledge. A compact, fine-tuned student model (BART-Base), the Prism, then specializes in synthesizing this knowledge into personalized explanations. Our extensive experiments on benchmark datasets reveal a key finding: the distillation process not only transfers knowledge but also acts as a noise filter. Our 140M-parameter Prism model significantly outperforms its 11B-parameter teacher in human evaluations of faithfulness and personalization, demonstrating an emergent ability to correct hallucinations present in the teacher's outputs. While achieving a 24x speedup and a 10x reduction in memory consumption, our analysis validates that decoupling, coupled with targeted distillation, provides an efficient and effective pathway to high-quality, and perhaps more importantly, trustworthy explainable recommendation.

The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation

TL;DR

The paper addresses the trade-off between recommendation accuracy and explanation quality by introducing Prism, a fully decoupled framework that uses a faithfulness-constrained knowledge-distillation pipeline to train a compact explanation generator from a large teacher LLM. By separating ranking from explanation generation, Prism achieves strong human-perceived faithfulness and personalization while delivering substantial efficiency gains (over 24x faster, ~10x less memory) and plug-and-play compatibility with any ranking module. It demonstrates that a compact student can surpass its large teacher on human judgments and even mitigate teacher hallucinations, revealing a knowledge refinement effect from distillation. The work highlights the limitations of automatic metrics like ROUGE-L for this task and provides a practical, trustworthy pathway for deploying explainable recommendations in real-world systems.

Abstract

The integration of Large Language Models (LLMs) into explainable recommendation systems often leads to a performance-efficiency trade-off in end-to-end architectures, where joint optimization of ranking and explanation can result in suboptimal compromises. To resolve this, we propose Prism, a novel decoupled framework that rigorously separates the recommendation process into a dedicated ranking stage and an explanation generation stage. This decomposition ensures that each component is optimized for its specific objective, eliminating inherent conflicts in coupled models. Inspired by knowledge distillation, Prism leverages a powerful, instruction-following teacher LLM (FLAN-T5-XXL) as an Oracle to produce high-fidelity explanatory knowledge. A compact, fine-tuned student model (BART-Base), the Prism, then specializes in synthesizing this knowledge into personalized explanations. Our extensive experiments on benchmark datasets reveal a key finding: the distillation process not only transfers knowledge but also acts as a noise filter. Our 140M-parameter Prism model significantly outperforms its 11B-parameter teacher in human evaluations of faithfulness and personalization, demonstrating an emergent ability to correct hallucinations present in the teacher's outputs. While achieving a 24x speedup and a 10x reduction in memory consumption, our analysis validates that decoupling, coupled with targeted distillation, provides an efficient and effective pathway to high-quality, and perhaps more importantly, trustworthy explainable recommendation.

Paper Structure

This paper contains 39 sections, 9 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: The overall framework of Prism. The offline stage consists of a teacher phase for data creation via knowledge distillation and a student phase for model fine-tuning. The online stage demonstrates how Prism functions as a decoupled module alongside any SOTA recommender.
  • Figure 2: Automatic evaluation results on ROUGE-L, GPTScore, and BS-F1 metrics across Yelp and MovieLens-1M datasets.
  • Figure 3: Human evaluation results on persuasiveness, personalization, and faithfulness dimensions.