Knowledge Acquisition on Mass-shooting Events via LLMs for AI-Driven Justice
Benign John Ihugba, Afsana Nasrin, Ling Wu, Lin Li, Lijun Qian, Xishuang Dong
TL;DR
This paper tackles the problem of extracting structured knowledge from mass-shooting event narratives to support AI-driven justice. It constructs the first mass-shooting NER dataset with 41 subcategories across offense, offender, victim, and environment, sourced from Mother Jones and annotated via a BIO scheme. Using few-shot prompts, the authors benchmark GPT-3.5, GPT-4o, and o1-mini, finding that GPT-4o delivers the strongest Micro Precision, Recall, and F1, while o1-mini provides a more efficient alternative; performance generally improves with more shots. The work demonstrates the viability of LLM-based NER for domain-specific information extraction and points toward building knowledge graphs for investigations and policy, with future work focusing on minority-category extraction and relation mining for richer knowledge graphs.
Abstract
Mass-shooting events pose a significant challenge to public safety, generating large volumes of unstructured textual data that hinder effective investigations and the formulation of public policy. Despite the urgency, few prior studies have effectively automated the extraction of key information from these events to support legal and investigative efforts. This paper presented the first dataset designed for knowledge acquisition on mass-shooting events through the application of named entity recognition (NER) techniques. It focuses on identifying key entities such as offenders, victims, locations, and criminal instruments, that are vital for legal and investigative purposes. The NER process is powered by Large Language Models (LLMs) using few-shot prompting, facilitating the efficient extraction and organization of critical information from diverse sources, including news articles, police reports, and social media. Experimental results on real-world mass-shooting corpora demonstrate that GPT-4o is the most effective model for mass-shooting NER, achieving the highest Micro Precision, Micro Recall, and Micro F1-scores. Meanwhile, o1-mini delivers competitive performance, making it a resource-efficient alternative for less complex NER tasks. It is also observed that increasing the shot count enhances the performance of all models, but the gains are more substantial for GPT-4o and o1-mini, highlighting their superior adaptability to few-shot learning scenarios.
