Table of Contents
Fetching ...

TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

Chenyue Zhou, Gürkan Solmaz, Flavio Cirillo, Kiril Gashteovski, Jonathan Fürst

TL;DR

TextMine addresses knowledge extraction from unstructured humanitarian mine action reports by introducing a domain-specific dataset, ontology-guided LLM pipeline, and bias-aware evaluation. By combining layout-aware document chunking, ontology-aligned prompting, and multi-perspective evaluation against IMSMA Core and Empathi ontologies, it generates (subject, relation, object) triples faithful to source text. Results show ontology-aligned prompts boost extraction accuracy and reduce hallucinations, while a bias-aware LLM-as-Judge enables effective reference-free evaluation that tracks close to ground truth. The work enables safer, transferable information sharing across HMA agencies and provides reproducible data and code to drive future research.

Abstract

Humanitarian Mine Action (HMA) addresses the challenge of detecting and removing landmines from conflict regions. Much of the life-saving operational knowledge produced by HMA agencies is buried in unstructured reports, limiting the transferability of information between agencies. To address this issue, we propose TextMine: the first dataset, evaluation framework and ontology-guided large language model (LLM) pipeline for knowledge extraction in the HMA domain. TextMine structures HMA reports into (subject, relation, object)-triples, thus creating domain-specific knowledge. To ensure real-world relevance, we created the dataset in collaboration with Cambodian Mine Action Center (CMAC). We further introduce a bias-aware evaluation framework that combines human-annotated triples with an LLM-as-Judge protocol to mitigate position bias in reference-free scoring. Our experiments show that ontology-aligned prompts improve extraction accuracy by up to 44.2%, reduce hallucinations by 22.5%, and enhance format adherence by 20.9% compared to baseline models. We publicly release the dataset and code.

TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

TL;DR

TextMine addresses knowledge extraction from unstructured humanitarian mine action reports by introducing a domain-specific dataset, ontology-guided LLM pipeline, and bias-aware evaluation. By combining layout-aware document chunking, ontology-aligned prompting, and multi-perspective evaluation against IMSMA Core and Empathi ontologies, it generates (subject, relation, object) triples faithful to source text. Results show ontology-aligned prompts boost extraction accuracy and reduce hallucinations, while a bias-aware LLM-as-Judge enables effective reference-free evaluation that tracks close to ground truth. The work enables safer, transferable information sharing across HMA agencies and provides reproducible data and code to drive future research.

Abstract

Humanitarian Mine Action (HMA) addresses the challenge of detecting and removing landmines from conflict regions. Much of the life-saving operational knowledge produced by HMA agencies is buried in unstructured reports, limiting the transferability of information between agencies. To address this issue, we propose TextMine: the first dataset, evaluation framework and ontology-guided large language model (LLM) pipeline for knowledge extraction in the HMA domain. TextMine structures HMA reports into (subject, relation, object)-triples, thus creating domain-specific knowledge. To ensure real-world relevance, we created the dataset in collaboration with Cambodian Mine Action Center (CMAC). We further introduce a bias-aware evaluation framework that combines human-annotated triples with an LLM-as-Judge protocol to mitigate position bias in reference-free scoring. Our experiments show that ontology-aligned prompts improve extraction accuracy by up to 44.2%, reduce hallucinations by 22.5%, and enhance format adherence by 20.9% compared to baseline models. We publicly release the dataset and code.

Paper Structure

This paper contains 30 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Example knowledge graph triple extraction from texts, guided by the HMA ontology. The task is to extract as many triples as possible while ensuring ontology conformance and source-text faithfulness.
  • Figure 2: Semi-automatic creation of the humanitarian mine action dataset. The dataset contains a large and diverse set of technical reports from the global mine action and a smaller LLM- and human-annotated portion for Cambodian mine action.
  • Figure 3: TextMine Overview. Reports are preprocessed into paragraph chunks, used as test inputs during inference. Each chunk is combined with an instruction template and ontology, then passed through LLMs for triple extraction. We apply a multi-perspective evaluation using both reference-based and reference-free methods. Extracted triples are stored in a database, queried by developers and HMA domain experts.
  • Figure 4: The combined visualization illustrates the impact of model selection and prompt strategy on extraction performance. Abbriviations on x-axis are about One-shot with: RS = Random Sentences; RP = Random Paragraphs; OS = Ontology-Aligned Sentence; OP = Ontology-Aligned Paragraph. The top Combined Score is achieved by Llama3-70B (93.24) closely followed by GPT-4o (93.13), both with OS prompt setting.
  • Figure 5: Accuracy metrics scores across prompt types for each model. OS demonstration prompts consistently result in the best accuracy across all four metrics and all five models. Note: ROUGE, METEOR are scaled by 150, BERTScore by 100 for better visibility.
  • ...and 2 more figures