Table of Contents
Fetching ...

PINE: Pipeline for Important Node Exploration in Attributed Networks

Elizaveta Kovtun, Maksim Makarenko, Natalia Semenova, Alexey Zaytsev, Semen Budennyy

TL;DR

PINE tackles the unsupervised identification of important nodes in attributed graphs by training a Graph Attention Network on a link-prediction task, allowing attention weights to reflect node influence. It jointly leverages structural connectivity and semantic node attributes, and extends to heterogeneous graphs via edge-type selection. Empirical results show that PINE outperforms traditional, topology-only baselines and competes with supervised methods on heterogeneous data, while proving effective in large-scale industrial scenarios such as patent networks and banking graphs. The authors also release the Patent Influence Dataset to support benchmarking and further research in industrial influence-detection tasks.

Abstract

A graph with semantically attributed nodes are a common data structure in a wide range of domains. It could be interlinked web data or citation networks of scientific publications. The essential problem for such a data type is to determine nodes that carry greater importance than all the others, a task that markedly enhances system monitoring and management. Traditional methods to identify important nodes in networks introduce centrality measures, such as node degree or more complex PageRank. However, they consider only the network structure, neglecting the rich node attributes. Recent methods adopt neural networks capable of handling node features, but they require supervision. This work addresses the identified gap--the absence of approaches that are both unsupervised and attribute-aware--by introducing a Pipeline for Important Node Exploration (PINE). At the core of the proposed framework is an attention-based graph model that incorporates node semantic features in the learning process of identifying the structural graph properties. The PINE's node importance scores leverage the obtained attention distribution. We demonstrate the superior performance of the proposed PINE method on various homogeneous and heterogeneous attributed networks. As an industry-implemented system, PINE tackles the real-world challenge of unsupervised identification of key entities within large-scale enterprise graphs.

PINE: Pipeline for Important Node Exploration in Attributed Networks

TL;DR

PINE tackles the unsupervised identification of important nodes in attributed graphs by training a Graph Attention Network on a link-prediction task, allowing attention weights to reflect node influence. It jointly leverages structural connectivity and semantic node attributes, and extends to heterogeneous graphs via edge-type selection. Empirical results show that PINE outperforms traditional, topology-only baselines and competes with supervised methods on heterogeneous data, while proving effective in large-scale industrial scenarios such as patent networks and banking graphs. The authors also release the Patent Influence Dataset to support benchmarking and further research in industrial influence-detection tasks.

Abstract

A graph with semantically attributed nodes are a common data structure in a wide range of domains. It could be interlinked web data or citation networks of scientific publications. The essential problem for such a data type is to determine nodes that carry greater importance than all the others, a task that markedly enhances system monitoring and management. Traditional methods to identify important nodes in networks introduce centrality measures, such as node degree or more complex PageRank. However, they consider only the network structure, neglecting the rich node attributes. Recent methods adopt neural networks capable of handling node features, but they require supervision. This work addresses the identified gap--the absence of approaches that are both unsupervised and attribute-aware--by introducing a Pipeline for Important Node Exploration (PINE). At the core of the proposed framework is an attention-based graph model that incorporates node semantic features in the learning process of identifying the structural graph properties. The PINE's node importance scores leverage the obtained attention distribution. We demonstrate the superior performance of the proposed PINE method on various homogeneous and heterogeneous attributed networks. As an industry-implemented system, PINE tackles the real-world challenge of unsupervised identification of key entities within large-scale enterprise graphs.

Paper Structure

This paper contains 40 sections, 16 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: PINE scheme. PINE is an approach for unsupervised identification of important nodes in directed networks that considers the node-level attributes. The importance score of a node is evaluated from the attention scores of GAT model trained with Link Prediction task. A direction of an edge from a node $v_i$ to a node $v_j$ means an information flow from $v_i$ to $v_j$. On the example of node 3, PINE score is calculated as a sum of attention weights between node 3 and nodes 1, 2, and 4. These attention weights reflect the extent of the usefulness of node 3 as an information provider to its neighbors.
  • Figure 2: Patent citation graph for physical photoresist model. Each approach highlights a top-5 important nodes from its perspective.
  • Figure 3: Running time comparison on PubMed dataset. PINE takes a reasonable time to infer node importance scores. Although, a quality of the produced sets of important nodes are much higher.