Table of Contents
Fetching ...

"Wait, did you mean the doctor?": Collecting a Dialogue Corpus for Topical Analysis

Amandine Decker, Vincent Tourneur, Maxime Amblard, Ellen Breitholtz

TL;DR

The paper addresses understanding topical organization in dialogue by proposing a controlled data-collection approach that yields rich topic-shift information. It introduces a balloon-task-based dyadic protocol and a Matrix-based messaging tool (Element) that can modify messages in transit to influence topic representation while logging all interactions. A pilot study with 12 participants demonstrates feasibility and highlights how different manipulations affect topic negotiation, revealing insights into topic perception and agreement dynamics. The work lays the groundwork for a scalable, multilingual dialogue corpus tailored for topical analysis, with potential improvements to current topic-modeling and discourse-segmentation approaches.

Abstract

Dialogue is at the core of human behaviour and being able to identify the topic at hand is crucial to take part in conversation. Yet, there are few accounts of the topical organisation in casual dialogue and of how people recognise the current topic in the literature. Moreover, analysing topics in dialogue requires conversations long enough to contain several topics and types of topic shifts. Such data is complicated to collect and annotate. In this paper we present a dialogue collection experiment which aims to build a corpus suitable for topical analysis. We will carry out the collection with a messaging tool we developed.

"Wait, did you mean the doctor?": Collecting a Dialogue Corpus for Topical Analysis

TL;DR

The paper addresses understanding topical organization in dialogue by proposing a controlled data-collection approach that yields rich topic-shift information. It introduces a balloon-task-based dyadic protocol and a Matrix-based messaging tool (Element) that can modify messages in transit to influence topic representation while logging all interactions. A pilot study with 12 participants demonstrates feasibility and highlights how different manipulations affect topic negotiation, revealing insights into topic perception and agreement dynamics. The work lays the groundwork for a scalable, multilingual dialogue corpus tailored for topical analysis, with potential improvements to current topic-modeling and discourse-segmentation approaches.

Abstract

Dialogue is at the core of human behaviour and being able to identify the topic at hand is crucial to take part in conversation. Yet, there are few accounts of the topical organisation in casual dialogue and of how people recognise the current topic in the literature. Moreover, analysing topics in dialogue requires conversations long enough to contain several topics and types of topic shifts. Such data is complicated to collect and annotate. In this paper we present a dialogue collection experiment which aims to build a corpus suitable for topical analysis. We will carry out the collection with a messaging tool we developed.
Paper Structure (6 sections, 1 figure)

This paper contains 6 sections, 1 figure.

Figures (1)

  • Figure 1: Possible manipulations of the messages with the tool. The bot in each room enables the server to be notified of the incoming messages, and to send them back in other rooms.