Findings of the First Workshop on Simulating Conversational Intelligence in Chat

Yvette Graham; Mohammed Rameez Qureshi; Haider Khalid; Gerasimos Lampouras; Ignacio Iacobacci; Qun Liu

Findings of the First Workshop on Simulating Conversational Intelligence in Chat

Yvette Graham, Mohammed Rameez Qureshi, Haider Khalid, Gerasimos Lampouras, Ignacio Iacobacci, Qun Liu

TL;DR

The main goal of this paper is to provide an overview of the shared task and a link to an additional paper that will include an in depth analysis of the shared task results following presentation at the workshop.

Abstract

The aim of the workshop was to bring together experts working on open-domain dialogue research. In this speedily advancing research area many challenges still exist, such as learning information from conversations, and engaging in a realistic and convincing simulation of human intelligence and reasoning. SCI-CHAT follows previous workshops on open domain dialogue but in contrast the focus of the shared task is simulation of intelligent conversation as judged in a live human evaluation. Models aim to include the ability to follow a challenging topic over a multi-turn conversation, while positing, refuting and reasoning over arguments. The workshop included both a research track and shared task. The main goal of this paper is to provide an overview of the shared task, and an in depth analysis of the shared task results following presentation at the workshop. The current paper is an extension of that made available prior to presentation of results at the workshop at EACL Malta (Graham et al., 2024). The data collected in the evaluation was made publicly available to aide future research. The code was also made available for the same purpose.

Findings of the First Workshop on Simulating Conversational Intelligence in Chat

TL;DR

Abstract

Paper Structure (10 sections, 4 figures, 2 tables)

This paper contains 10 sections, 4 figures, 2 tables.

Introduction
Shared Task
Participating Models
Human Evaluation
Direct Assessment
Participating Systems
Evaluation Criteria
Evaluation and Results
Conclusion
Appendix

Figures (4)

Figure 1: Rater Agreement
Figure 2: Significance Test results where p-value is calculated based on the distribution of ratings for each model in a Mann-Whitney U Test.
Figure 3: Score Distributions Top 10 Individual Workers in sci-chat live human evaluation
Figure 4: Continuous rating scale employed in sci-chat live human evaluation

Findings of the First Workshop on Simulating Conversational Intelligence in Chat

TL;DR

Abstract

Findings of the First Workshop on Simulating Conversational Intelligence in Chat

Authors

TL;DR

Abstract

Table of Contents

Figures (4)