Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

Zilin Ma; Susannah; Su; Nathan Zhao; Linn Bieske; Blake Bullwinkel; Yanyi Zhang; Sophia; Yang; Ziqing Luo; Siyao Li; Gekai Liao; Boxiang Wang; Jinglun Gao; Zihan Wen; Claude Bruderlein; Weiwei Pan

Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

Zilin Ma, Susannah, Su, Nathan Zhao, Linn Bieske, Blake Bullwinkel, Yanyi Zhang, Sophia, Yang, Ziqing Luo, Siyao Li, Gekai Liao, Boxiang Wang, Jinglun Gao, Zihan Wen, Claude Bruderlein, Weiwei Pan

TL;DR

This paper investigates the use of large language models (LLMs) to support humanitarian frontline negotiations. By prompting GPT-4-based tools to auto-fill established synthesis templates (Island of Agreement, Iceberg/CSS, and Stakeholder Mapping) from real-case materials and by benchmarking against practitioner outputs, the study demonstrates that LLMs can produce stable, comparable case analyses and substantial time savings. Through 13 interviews with seasoned negotiators, it identifies two core use cases—context analysis and ideation augmentation—while highlighting critical concerns around confidentiality, bias, accuracy, and adoption. The findings suggest that with careful governance, prompt engineering, and human oversight, LLMs can meaningfully enhance preparedness and strategy in humanitarian negotiations, accelerating information synthesis without replacing human judgment.

Abstract

Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid decision making in frontline negotiation. Through in-depth interviews with 13 experienced frontline negotiators, we identified their needs for AI-assisted case analysis and creativity support, as well as concerns surrounding confidentiality and model bias. We further explored the potential for AI augmentation of three standard tools used in frontline negotiation planning. We evaluated the quality and stability of our ChatGPT-based negotiation tools in the context of two real cases. Our findings highlight the potential for LLMs to enhance humanitarian negotiations and underscore the need for careful ethical and practical considerations.

Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

TL;DR

Abstract

Paper Structure (40 sections, 11 figures, 1 table)

This paper contains 40 sections, 11 figures, 1 table.

Introduction
Related Work
Frontline Negotiation
Templates for Synthesizing Information in Frontline Negotiation
Method
Quantitative Analysis of LLMs Populating Frontline Negotiation Templates
Human-Centered Benchmarking and Evaluation of Tools
Quantitative Results
Interview Results: Opportunities and Concerns of Using LLMs in the Frontline
Opportunities
Concerns
Discussion
Conclusion
Appendix
Synthesizing Tool Development
...and 25 more sections

Figures (11)

Figure 1: Response Generation and Analysis Pipeline
Figure 2: Cosine similarity scores between practitioner and GPT responses using the IoA framework for the Health for All case averaged 0.93. ChatGPT was called 30 times with the same prompt, and responses were compared using BERT. Scores ranged from 0.91 to 0.94, showing consistent alignment with practitioner responses and validating the accuracy of LLM outputs in real-world negotiations.
Figure 3: Graphic Example of Iceberg and CSS Framework CCHN2019
Figure 4: Example of Stakeholder Mapping CCHN2019
Figure 5: Cosine Similarity of Iceberg and CSS on FwB Case This heatmap visualizes the pairwise cosine similarity scores between ChatGPT responses for the Iceberg/CSS framework-based (FwB) case. The texts were processed using BERT, and the heatmap highlights high consistency with median similarity scores above 0.98, indicating stable and reliable outputs across 30 calls.
...and 6 more figures

Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

TL;DR

Abstract

Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)