Table of Contents
Fetching ...

Forecasting Conversation Derailments Through Generation

Yunfan Zhang, Kathleen McKeown, Smaranda Muresan

TL;DR

The work tackles forecasting future conversation derailments from benign histories by generating multiple plausible future turns with a fine-tuned LLM and predicting derailment with a dedicated classifier, aggregating via majority vote. It introduces a flexible framework that can incorporate social orientation labels to guide generation, and it validates the approach on CGA-Wiki and BNC, showing significant accuracy gains over prior methods and GPT-4o few-shot baselines. The method emphasizes prediction through generation, robust aggregation, and careful ablations of generation depth and social cues, achieving notable improvements while acknowledging higher computational costs. The results have practical implications for proactive moderation and conflict mitigation, and the authors provide open-source resources for replication and extension.

Abstract

Forecasting conversation derailment can be useful in real-world settings such as online content moderation, conflict resolution, and business negotiations. However, despite language models' success at identifying offensive speech present in conversations, they struggle to forecast future conversation derailments. In contrast to prior work that predicts conversation outcomes solely based on the past conversation history, our approach samples multiple future conversation trajectories conditioned on existing conversation history using a fine-tuned LLM. It predicts the conversation outcome based on the consensus of these trajectories. We also experimented with leveraging socio-linguistic attributes, which reflect turn-level conversation dynamics, as guidance when generating future conversations. Our method of future conversation trajectories surpasses state-of-the-art results on English conversation derailment prediction benchmarks and demonstrates significant accuracy gains in ablation studies.

Forecasting Conversation Derailments Through Generation

TL;DR

The work tackles forecasting future conversation derailments from benign histories by generating multiple plausible future turns with a fine-tuned LLM and predicting derailment with a dedicated classifier, aggregating via majority vote. It introduces a flexible framework that can incorporate social orientation labels to guide generation, and it validates the approach on CGA-Wiki and BNC, showing significant accuracy gains over prior methods and GPT-4o few-shot baselines. The method emphasizes prediction through generation, robust aggregation, and careful ablations of generation depth and social cues, achieving notable improvements while acknowledging higher computational costs. The results have practical implications for proactive moderation and conflict mitigation, and the authors provide open-source resources for replication and extension.

Abstract

Forecasting conversation derailment can be useful in real-world settings such as online content moderation, conflict resolution, and business negotiations. However, despite language models' success at identifying offensive speech present in conversations, they struggle to forecast future conversation derailments. In contrast to prior work that predicts conversation outcomes solely based on the past conversation history, our approach samples multiple future conversation trajectories conditioned on existing conversation history using a fine-tuned LLM. It predicts the conversation outcome based on the consensus of these trajectories. We also experimented with leveraging socio-linguistic attributes, which reflect turn-level conversation dynamics, as guidance when generating future conversations. Our method of future conversation trajectories surpasses state-of-the-art results on English conversation derailment prediction benchmarks and demonstrates significant accuracy gains in ablation studies.

Paper Structure

This paper contains 31 sections, 5 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: An example conversation from the BNC dataset, including background and the future turn. Offensive speech is highlighted in red. Our task requires forecasting whether the derailment would occur in the future based on the conversation so far.
  • Figure 2: An illustration of our methodology. Social orientation labels are highlighted in brown. We sample multiple potential conversation continuations from a given conversation history. Then, we predict individual conversation outcomes by combining each continuation with the given conversation history. We use the majority of the individual results to predict our final conversation outcome.
  • Figure 3: An example conversation from the BNC dataset, including background and the future turn as generated by our fine-tuned LLM. Social orientation labels are highlighted in brown. Offensive speech is highlighted in red. When only given the benign conversation history, the classifier fails to forecast if derailments would happen in the future. Generating the future conversation turns and providing the future turns to the classifier allows the classifier to forecast derailments correctly.
  • Figure 4: Human evaluation results for the accuracy of GPT-4o-annotated social orientation labels. Overall, the GPT-4o annotations exhibit good quality, with human evaluators agreeing with the predicted labels 70% of the time.
  • Figure 5: User interface for the human evaluation of social orientation labels. Human annotators were asked to evaluate label quality turn-by-turn and axis-by-axis.