With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

Amandine Decker; Maxime Amblard

With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

Amandine Decker, Maxime Amblard

TL;DR

This work tackles Dialogue Topic Segmentation (DTS) in open-domain, multi-party casual conversations by evaluating a TextTiling-based framework enhanced with linguistic features and by reproducing a BERT-based coherence model. It shows that a feature-based approach (Block Comparison with Speaker Depth) can match or exceed neural methods on the Friends dataset while offering explainability and lower computation, whereas the BERT-based model exhibits stability issues on complex, conversational data. Through systematic experiments, the authors demonstrate the value of utterance-focused spans, memory for new vocabulary, and speaker/discourse cues in identifying topic boundaries. The results suggest practical DTS strategies for open-domain interactions and point to future work in integrating newer models and multimodal signals for robust topic segmentation.

Abstract

Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by first segmenting dialogues into smaller topically coherent sets of utterances. Understanding the interactions between these segments would then enable us to propose a model of topic organisation at a dialogue level. In this paper we work with open-domain conversations and try to reach a comparable level of accuracy as recent machine learning based topic segmentation models but with a formal approach. The features we identify as meaningful for this task help us understand better the topical structure of a conversation.

With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

TL;DR

Abstract

Paper Structure (26 sections, 3 figures, 7 tables)

This paper contains 26 sections, 3 figures, 7 tables.

Introduction
Related work
Topic segmentation
TextTiling Approach
Methodology
Models
Baselines
Evaluation Metrics
Dataset
Adapting xing-carenini-2021-improving's BERT-based model to our Dataset
Learning Curve
Coherence Layers
Model Stability
Improving the original TextTiling algorithm with Linguistic Features
Adaptations of the Original TextTiling Algorithm
...and 11 more sections

Figures (3)

Figure 1: Segmentation in spans of $w$ tokens and computation of the lexical scores in the TextTiling algorithm.
Figure 2: Examples of lexical scores used in the depth scores computation.
Figure 3: Levels of coherence considered by xing-carenini-2021-improving

With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

TL;DR

Abstract

With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (3)