Table of Contents
Fetching ...

From low resource information extraction to identifying influential nodes in knowledge graphs

Erica Cai, Olga Simek, Benjamin A. Miller, Danielle Sullivan-Pao, Evan Young, Christopher L. Smith

TL;DR

The work tackles identifying important entities in intelligence texts by constructing a knowledge graph from fine-grained NER and relation extraction under limited labeling and evolving type schemas. It introduces a novel few-shot fine-grained NER method based on text entailment to populate nodes and a zero-shot relation extraction approach to populate edges, followed by centrality-based ranking to identify key entities. The paper shows the TE-based few-shot NER method outperforms baselines on Few-NERD, analyzes evaluation challenges for zero-shot RE, and demonstrates that relation extraction errors impact centrality differently depending on graph topology, using both synthetic and real networks. These findings guide practical deployment and evaluation resource design for knowledge-graph approaches in security-related contexts, highlighting when centrality estimates are reliable and when they should be treated with caution.

Abstract

We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph. We introduce a novel method that extracts fine-grained entities in a few-shot setting (few labeled examples), given limited resources available to label the frequently changing entity types that intelligence analysts are interested in. It outperforms other state-of-the-art methods. Next, we identify challenges facing previous evaluations of zero-shot (no labeled examples) methods for extracting relations, affecting the step of populating edges. Finally, we explore the utility of the pipeline: given the goal of identifying important entities, we evaluate the impact of relation extraction errors on the identification of central nodes in several real and synthetic networks. The impact of these errors varies significantly by graph topology, suggesting that confidence in measurements based on automatically extracted relations should depend on observed network features.

From low resource information extraction to identifying influential nodes in knowledge graphs

TL;DR

The work tackles identifying important entities in intelligence texts by constructing a knowledge graph from fine-grained NER and relation extraction under limited labeling and evolving type schemas. It introduces a novel few-shot fine-grained NER method based on text entailment to populate nodes and a zero-shot relation extraction approach to populate edges, followed by centrality-based ranking to identify key entities. The paper shows the TE-based few-shot NER method outperforms baselines on Few-NERD, analyzes evaluation challenges for zero-shot RE, and demonstrates that relation extraction errors impact centrality differently depending on graph topology, using both synthetic and real networks. These findings guide practical deployment and evaluation resource design for knowledge-graph approaches in security-related contexts, highlighting when centrality estimates are reliable and when they should be treated with caution.

Abstract

We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph. We introduce a novel method that extracts fine-grained entities in a few-shot setting (few labeled examples), given limited resources available to label the frequently changing entity types that intelligence analysts are interested in. It outperforms other state-of-the-art methods. Next, we identify challenges facing previous evaluations of zero-shot (no labeled examples) methods for extracting relations, affecting the step of populating edges. Finally, we explore the utility of the pipeline: given the goal of identifying important entities, we evaluate the impact of relation extraction errors on the identification of central nodes in several real and synthetic networks. The impact of these errors varies significantly by graph topology, suggesting that confidence in measurements based on automatically extracted relations should depend on observed network features.
Paper Structure (12 sections, 6 figures, 5 tables)

This paper contains 12 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The pipeline from text to knowledge graph network, which can be used to identify important entities.
  • Figure 2: Example input and output of few-shot fine-grained NER (first row), the first step, and of zero-shot relation extraction (second row), the second step.
  • Figure 3: Example of text entailment for fine-grained NER.
  • Figure 4: Steps of the NER approach, with examples in red.
  • Figure 5: Example of text entailment for zero-shot relation extraction.
  • ...and 1 more figures