Table of Contents
Fetching ...

ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs

Debashis Gupta, Aditi Golder, Luis Fernendez, Miles Silman, Greg Lersen, Fan Yang, Bob Plemmons, Sarra Alqahtani, Paul Victor Pauca

TL;DR

ASGM-KG addresses the fragmentation of knowledge on artisanal and small-scale gold mining by introducing a domain-informed knowledge graph built from 9 expert-selected sources. It uses a Text2kgbench-guided LLM workflow to extract 2,653 RDF triples and then applies the unsupervised Data Assessment Semantics (DAS) framework to fact-check content through search-driven page summarization and majority voting, achieving robust validation. The current release contains 1,899 DAS-validated triples across 1,650 entities and 785 relations, with 43% new entities and 29% new relations, and supports three downstream tasks: Q&A in Neo4J, subgraph summarization, and natural-language chat via Llama-3-70b-chat. The resource is publicly accessible and intended to expand with more documents and expert-in-the-loop curation, aiming to aid governance, policy design, and interdisciplinary research on ASGM and its environmental impacts.

Abstract

Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial information about ASGM practices and their environmental effects. The current version of ASGM-KG consists of 1,899 triples extracted using a large language model (LLM) from documents and reports published by both non-governmental and governmental organizations. These documents were carefully selected by a group of tropical ecologists with expertise in ASGM. This knowledge graph was validated using two methods. First, a small team of ASGM experts reviewed and labeled triples as factual or non-factual. Second, we devised and applied an automated factual reduction framework that relies on a search engine and an LLM for labeling triples. Our framework performs as well as five baselines on a publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG validated by domain experts. ASGM-KG demonstrates an advancement in knowledge aggregation and representation for complex, interdisciplinary environmental crises such as ASGM.

ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs

TL;DR

ASGM-KG addresses the fragmentation of knowledge on artisanal and small-scale gold mining by introducing a domain-informed knowledge graph built from 9 expert-selected sources. It uses a Text2kgbench-guided LLM workflow to extract 2,653 RDF triples and then applies the unsupervised Data Assessment Semantics (DAS) framework to fact-check content through search-driven page summarization and majority voting, achieving robust validation. The current release contains 1,899 DAS-validated triples across 1,650 entities and 785 relations, with 43% new entities and 29% new relations, and supports three downstream tasks: Q&A in Neo4J, subgraph summarization, and natural-language chat via Llama-3-70b-chat. The resource is publicly accessible and intended to expand with more documents and expert-in-the-loop curation, aiming to aid governance, policy design, and interdisciplinary research on ASGM and its environmental impacts.

Abstract

Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial information about ASGM practices and their environmental effects. The current version of ASGM-KG consists of 1,899 triples extracted using a large language model (LLM) from documents and reports published by both non-governmental and governmental organizations. These documents were carefully selected by a group of tropical ecologists with expertise in ASGM. This knowledge graph was validated using two methods. First, a small team of ASGM experts reviewed and labeled triples as factual or non-factual. Second, we devised and applied an automated factual reduction framework that relies on a search engine and an LLM for labeling triples. Our framework performs as well as five baselines on a publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG validated by domain experts. ASGM-KG demonstrates an advancement in knowledge aggregation and representation for complex, interdisciplinary environmental crises such as ASGM.
Paper Structure (11 sections, 2 figures, 3 tables, 2 algorithms)

This paper contains 11 sections, 2 figures, 3 tables, 2 algorithms.

Figures (2)

  • Figure 1: Aerial view of La Pampa, an ASGM hot zone in Madre de Dios, Perú. (Photo by Jorge Caballero)
  • Figure 2: Data Assessment Semantics framework: Automated process for factual validation via open-source knowledge.