"Wait, did you mean the doctor?": Collecting a Dialogue Corpus for Topical Analysis
Amandine Decker, Vincent Tourneur, Maxime Amblard, Ellen Breitholtz
TL;DR
The paper addresses understanding topical organization in dialogue by proposing a controlled data-collection approach that yields rich topic-shift information. It introduces a balloon-task-based dyadic protocol and a Matrix-based messaging tool (Element) that can modify messages in transit to influence topic representation while logging all interactions. A pilot study with 12 participants demonstrates feasibility and highlights how different manipulations affect topic negotiation, revealing insights into topic perception and agreement dynamics. The work lays the groundwork for a scalable, multilingual dialogue corpus tailored for topical analysis, with potential improvements to current topic-modeling and discourse-segmentation approaches.
Abstract
Dialogue is at the core of human behaviour and being able to identify the topic at hand is crucial to take part in conversation. Yet, there are few accounts of the topical organisation in casual dialogue and of how people recognise the current topic in the literature. Moreover, analysing topics in dialogue requires conversations long enough to contain several topics and types of topic shifts. Such data is complicated to collect and annotate. In this paper we present a dialogue collection experiment which aims to build a corpus suitable for topical analysis. We will carry out the collection with a messaging tool we developed.
