Table of Contents
Fetching ...

Robots in the Middle: Evaluating LLMs in Dispute Resolution

Jinzhe Tan, Hannes Westermann, Nikhil Reddy Pottanigari, Jaromír Šavelka, Sébastien Meeùs, Mia Godet, Karim Benyekhlef

TL;DR

This paper investigates to which extent large language models (LLMs) are able to act as mediators, and whether LLMs are able to analyze dispute conversations, select suitable intervention types, and generate appropriate intervention messages.

Abstract

Mediation is a dispute resolution method featuring a neutral third-party (mediator) who intervenes to help the individuals resolve their dispute. In this paper, we investigate to which extent large language models (LLMs) are able to act as mediators. We investigate whether LLMs are able to analyze dispute conversations, select suitable intervention types, and generate appropriate intervention messages. Using a novel, manually created dataset of 50 dispute scenarios, we conduct a blind evaluation comparing LLMs with human annotators across several key metrics. Overall, the LLMs showed strong performance, even outperforming our human annotators across dimensions. Specifically, in 62% of the cases, the LLMs chose intervention types that were rated as better than or equivalent to those chosen by humans. Moreover, in 84% of the cases, the intervention messages generated by the LLMs were rated as better than or equal to the intervention messages written by humans. LLMs likewise performed favourably on metrics such as impartiality, understanding and contextualization. Our results demonstrate the potential of integrating AI in online dispute resolution (ODR) platforms.

Robots in the Middle: Evaluating LLMs in Dispute Resolution

TL;DR

This paper investigates to which extent large language models (LLMs) are able to act as mediators, and whether LLMs are able to analyze dispute conversations, select suitable intervention types, and generate appropriate intervention messages.

Abstract

Mediation is a dispute resolution method featuring a neutral third-party (mediator) who intervenes to help the individuals resolve their dispute. In this paper, we investigate to which extent large language models (LLMs) are able to act as mediators. We investigate whether LLMs are able to analyze dispute conversations, select suitable intervention types, and generate appropriate intervention messages. Using a novel, manually created dataset of 50 dispute scenarios, we conduct a blind evaluation comparing LLMs with human annotators across several key metrics. Overall, the LLMs showed strong performance, even outperforming our human annotators across dimensions. Specifically, in 62% of the cases, the LLMs chose intervention types that were rated as better than or equivalent to those chosen by humans. Moreover, in 84% of the cases, the intervention messages generated by the LLMs were rated as better than or equal to the intervention messages written by humans. LLMs likewise performed favourably on metrics such as impartiality, understanding and contextualization. Our results demonstrate the potential of integrating AI in online dispute resolution (ODR) platforms.

Paper Structure

This paper contains 19 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A screenshot from the LLMediator, showing a dispute prior to the mediator's intervention.
  • Figure 2: Frequency of Intervention Types Chosen by LLM and Human
  • Figure 3: The bar chart shows the distribution of responses evaluating the performance of LLMs compared to humans across the five metrics we set.