Table of Contents
Fetching ...

Automating Violence Detection and Categorization from Ancient Texts

Alhassan Abdelhalim, Michaela Regneri

TL;DR

The paper tackles the challenge of automating violence detection and multi-dimensional categorization in ancient texts by leveraging fine-tuned large language models on the ERIS dataset, with supplementary data from Perseus. It demonstrates that fine-tuning, especially with data augmentation, yields high performance for detecting violent passages ($F_1$ up to $0.93$) and robust results ($F_1$ around $0.8$) for categorizing violence across context, motive, and long-term consequences. The work also benchmarks a zero-shot GPT-4o mini baseline and discusses the comparative advantages of fine-tuned LLMs versus API-based approaches, advocating a hybrid human-in-the-loop workflow for large-scale historical analysis. Overall, the framework promises to substantially accelerate humanities research by enabling rapid, scalable extraction and annotation of violence data while preserving interpretive depth through expert validation.

Abstract

Violence descriptions in literature offer valuable insights for a wide range of research in the humanities. For historians, depictions of violence are of special interest for analyzing the societal dynamics surrounding large wars and individual conflicts of influential people. Harvesting data for violence research manually is laborious and time-consuming. This study is the first one to evaluate the effectiveness of large language models (LLMs) in identifying violence in ancient texts and categorizing it across multiple dimensions. Our experiments identify LLMs as a valuable tool to scale up the accurate analysis of historical texts and show the effect of fine-tuning and data augmentation, yielding an F1-score of up to 0.93 for violence detection and 0.86 for fine-grained violence categorization.

Automating Violence Detection and Categorization from Ancient Texts

TL;DR

The paper tackles the challenge of automating violence detection and multi-dimensional categorization in ancient texts by leveraging fine-tuned large language models on the ERIS dataset, with supplementary data from Perseus. It demonstrates that fine-tuning, especially with data augmentation, yields high performance for detecting violent passages ( up to ) and robust results ( around ) for categorizing violence across context, motive, and long-term consequences. The work also benchmarks a zero-shot GPT-4o mini baseline and discusses the comparative advantages of fine-tuned LLMs versus API-based approaches, advocating a hybrid human-in-the-loop workflow for large-scale historical analysis. Overall, the framework promises to substantially accelerate humanities research by enabling rapid, scalable extraction and annotation of violence data while preserving interpretive depth through expert validation.

Abstract

Violence descriptions in literature offer valuable insights for a wide range of research in the humanities. For historians, depictions of violence are of special interest for analyzing the societal dynamics surrounding large wars and individual conflicts of influential people. Harvesting data for violence research manually is laborious and time-consuming. This study is the first one to evaluate the effectiveness of large language models (LLMs) in identifying violence in ancient texts and categorizing it across multiple dimensions. Our experiments identify LLMs as a valuable tool to scale up the accurate analysis of historical texts and show the effect of fine-tuning and data augmentation, yielding an F1-score of up to 0.93 for violence detection and 0.86 for fine-grained violence categorization.

Paper Structure

This paper contains 20 sections, 3 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: An entry from ERIS titled : Alexander kills Cleitus with a spear.
  • Figure 2: Data Preprocessing Pipeline for Violence Detection.
  • Figure 3: An example of our text augmentation. This approach effectively quadriples the training data.