Table of Contents
Fetching ...

Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors

Chen Huang, Peixin Qin, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

TL;DR

This paper introduces Concept, an Inclusive CRS Evaluation Protocol that unifies system- and user-centric factors into three characteristics (Recommendation Intelligence, Social Intelligence, and Personification) and six abilities, evaluated via an LLM-based user simulator and evaluator with fine-grained rubrics. By testing off-the-shelf CRS models on 6720 simulated conversations across Redial and OpenDialKG, the study reveals that state-of-the-art CHATCRS excels in social cooperation and recommendation quality but suffers from identity-related and reliability issues, including hallucinations and deceptive explanations. The results demonstrate the value of a quantitative, rubric-driven evaluation framework for diagnosing strengths and risks in CRS behavior, and highlight the need for identity-aware, trustworthy, and socially aware systems. Concept thereby sets a foundation for more user-centric and ethically aligned CRS improvements and provides actionable guidance for researchers and practitioners. The work also discusses limitations, notably potential biases in LLM-based simulation and the need for higher-quality, diverse datasets to further validate the protocol.

Abstract

The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors. We conceptualise three key characteristics in representing such factors and further divide them into six primary abilities. To implement Concept, we adopt a LLM-based user simulator and evaluator with scoring rubrics that are tailored for each primary ability. Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models. Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS, thereby setting the foundation for CRS improvement.

Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors

TL;DR

This paper introduces Concept, an Inclusive CRS Evaluation Protocol that unifies system- and user-centric factors into three characteristics (Recommendation Intelligence, Social Intelligence, and Personification) and six abilities, evaluated via an LLM-based user simulator and evaluator with fine-grained rubrics. By testing off-the-shelf CRS models on 6720 simulated conversations across Redial and OpenDialKG, the study reveals that state-of-the-art CHATCRS excels in social cooperation and recommendation quality but suffers from identity-related and reliability issues, including hallucinations and deceptive explanations. The results demonstrate the value of a quantitative, rubric-driven evaluation framework for diagnosing strengths and risks in CRS behavior, and highlight the need for identity-aware, trustworthy, and socially aware systems. Concept thereby sets a foundation for more user-centric and ethically aligned CRS improvements and provides actionable guidance for researchers and practitioners. The work also discusses limitations, notably potential biases in LLM-based simulation and the need for higher-quality, diverse datasets to further validate the protocol.

Abstract

The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors. We conceptualise three key characteristics in representing such factors and further divide them into six primary abilities. To implement Concept, we adopt a LLM-based user simulator and evaluator with scoring rubrics that are tailored for each primary ability. Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models. Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS, thereby setting the foundation for CRS improvement.
Paper Structure (31 sections, 11 figures, 18 tables)

This paper contains 31 sections, 11 figures, 18 tables.

Figures (11)

  • Figure 1: Concept integrates both system- and user-centric factors into three characteristics based on the previous taxonomy on human-AI interactions. Such characteristics are further divided into six primary abilities to enhance the inclusiveness in evaluations.
  • Figure 2: Evaluation overview.
  • Figure 3: CHATCRS is sensitive to contextual nuance.
  • Figure 4: Evaluation on social-centric characteristics. CRS strives to self-express sincerely.
  • Figure 4: Results of persuasiveness scores. CHATCRS are highly persuasive.
  • ...and 6 more figures