TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

Chenyue Zhou; Gürkan Solmaz; Flavio Cirillo; Kiril Gashteovski; Jonathan Fürst

TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

Chenyue Zhou, Gürkan Solmaz, Flavio Cirillo, Kiril Gashteovski, Jonathan Fürst

TL;DR

TextMine addresses knowledge extraction from unstructured humanitarian mine action reports by introducing a domain-specific dataset, ontology-guided LLM pipeline, and bias-aware evaluation. By combining layout-aware document chunking, ontology-aligned prompting, and multi-perspective evaluation against IMSMA Core and Empathi ontologies, it generates (subject, relation, object) triples faithful to source text. Results show ontology-aligned prompts boost extraction accuracy and reduce hallucinations, while a bias-aware LLM-as-Judge enables effective reference-free evaluation that tracks close to ground truth. The work enables safer, transferable information sharing across HMA agencies and provides reproducible data and code to drive future research.

Abstract

Humanitarian Mine Action (HMA) addresses the challenge of detecting and removing landmines from conflict regions. Much of the life-saving operational knowledge produced by HMA agencies is buried in unstructured reports, limiting the transferability of information between agencies. To address this issue, we propose TextMine: the first dataset, evaluation framework and ontology-guided large language model (LLM) pipeline for knowledge extraction in the HMA domain. TextMine structures HMA reports into (subject, relation, object)-triples, thus creating domain-specific knowledge. To ensure real-world relevance, we created the dataset in collaboration with Cambodian Mine Action Center (CMAC). We further introduce a bias-aware evaluation framework that combines human-annotated triples with an LLM-as-Judge protocol to mitigate position bias in reference-free scoring. Our experiments show that ontology-aligned prompts improve extraction accuracy by up to 44.2%, reduce hallucinations by 22.5%, and enhance format adherence by 20.9% compared to baseline models. We publicly release the dataset and code.

TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

TL;DR

Abstract

TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)