With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations
Amandine Decker, Maxime Amblard
TL;DR
This work tackles Dialogue Topic Segmentation (DTS) in open-domain, multi-party casual conversations by evaluating a TextTiling-based framework enhanced with linguistic features and by reproducing a BERT-based coherence model. It shows that a feature-based approach (Block Comparison with Speaker Depth) can match or exceed neural methods on the Friends dataset while offering explainability and lower computation, whereas the BERT-based model exhibits stability issues on complex, conversational data. Through systematic experiments, the authors demonstrate the value of utterance-focused spans, memory for new vocabulary, and speaker/discourse cues in identifying topic boundaries. The results suggest practical DTS strategies for open-domain interactions and point to future work in integrating newer models and multimodal signals for robust topic segmentation.
Abstract
Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by first segmenting dialogues into smaller topically coherent sets of utterances. Understanding the interactions between these segments would then enable us to propose a model of topic organisation at a dialogue level. In this paper we work with open-domain conversations and try to reach a comparable level of accuracy as recent machine learning based topic segmentation models but with a formal approach. The features we identify as meaningful for this task help us understand better the topical structure of a conversation.
