Table of Contents
Fetching ...

Dagstuhl Perspectives Workshop 24352 -- Conversational Agents: A Framework for Evaluation (CAFE): Manifesto

Christine Bauer, Li Chen, Nicola Ferro, Norbert Fuhr, Avishek Anand, Timo Breuer, Guglielmo Faggioli, Ophir Frieder, Hideo Joho, Jussi Karlgren, Johannes Kiesel, Bart P. Knijnenburg, Aldo Lipani, Lien Michiels, Andrea Papenmeier, Maria Soledad Pera, Mark Sanderson, Scott Sanner, Benno Stein, Johanne R. Trippas, Karin Verspoor, Martijn C Willemsen

TL;DR

The manifesto tackles the challenge of evaluating conversational information access (CONIAC) systems that merge IR and RS capabilities. It introduces a two-layer CONIAC World Model (Process Layer and System Layer) to represent dynamic conversations and stateful interaction, and proposes the CAFE framework with a dynamic, probes-based Evaluation Layer. The six components—Stakeholder Goals, User Aspects, Task, Criteria, Methodology, and Measures—together define a time-aware, multi-faceted evaluation workflow, spanning both system- and user-centric perspectives. The paper highlights current evaluation limitations, outlines concrete research directions, and argues for widespread adoption to advance cross-disciplinary development, funding, and industry practice in CONIAC research.

Abstract

During the workshop, we deeply discussed what CONversational Information ACcess (CONIAC) is and its unique features, proposing a world model abstracting it, and defined the Conversational Agents Framework for Evaluation (CAFE) for the evaluation of CONIAC systems, consisting of six major components: 1) goals of the system's stakeholders, 2) user tasks to be studied in the evaluation, 3) aspects of the users carrying out the tasks, 4) evaluation criteria to be considered, 5) evaluation methodology to be applied, and 6) measures for the quantitative criteria chosen.

Dagstuhl Perspectives Workshop 24352 -- Conversational Agents: A Framework for Evaluation (CAFE): Manifesto

TL;DR

The manifesto tackles the challenge of evaluating conversational information access (CONIAC) systems that merge IR and RS capabilities. It introduces a two-layer CONIAC World Model (Process Layer and System Layer) to represent dynamic conversations and stateful interaction, and proposes the CAFE framework with a dynamic, probes-based Evaluation Layer. The six components—Stakeholder Goals, User Aspects, Task, Criteria, Methodology, and Measures—together define a time-aware, multi-faceted evaluation workflow, spanning both system- and user-centric perspectives. The paper highlights current evaluation limitations, outlines concrete research directions, and argues for widespread adoption to advance cross-disciplinary development, funding, and industry practice in CONIAC research.

Abstract

During the workshop, we deeply discussed what CONversational Information ACcess (CONIAC) is and its unique features, proposing a world model abstracting it, and defined the Conversational Agents Framework for Evaluation (CAFE) for the evaluation of CONIAC systems, consisting of six major components: 1) goals of the system's stakeholders, 2) user tasks to be studied in the evaluation, 3) aspects of the users carrying out the tasks, 4) evaluation criteria to be considered, 5) evaluation methodology to be applied, and 6) measures for the quantitative criteria chosen.

Paper Structure

This paper contains 22 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: An example of a possible conversation in the product search domain.
  • Figure 2: An example of conversation of \ref{['fig:conversation']} using GPT-3.
  • Figure 3: An example of conversation of \ref{['fig:conversation']} using GPT-4 via Microsoft Copilot.
  • Figure 4: The CONIAC World Model.
  • Figure 5: Evaluation Layer in the CONIAC World Model.
  • ...and 5 more figures