Table of Contents
Fetching ...

Article Classification with Graph Neural Networks and Multigraphs

Khang Ly, Yury Kashnitsky, Savvas Chamezopoulos, Valeria Krzhizhanovskaya

TL;DR

The paper tackles the challenge of classifying scholarly articles into context-specific taxonomies by enriching Graph Neural Network inputs with multi-graph representations that encode multiple signals of relatedness. It combines References, Authorship, Source, and Subject Area edge types with state-of-the-art LM-based textual embeddings (SimTG, TAPE) and applies an R-GCN transformation to enable heterogeneous inputs, showing consistent performance gains across OGBN-arXiv and PubMed diabetes. The main contributions are a data-driven multi-graph construction methodology, an extensive ablation study identifying robust edge-type configurations, and evidence that simple 2-layer GNNs can achieve competitive results with SOTA methods when fed enriched graphs. This approach offers a scalable, architecture-lean path to improved article classification using readily available metadata and lightweight models, with reproducibility resources provided.

Abstract

Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedness, e.g. references, co-authorship, shared publication source, shared subject headings, as distinct edge types. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph and PubMed Central, respectively. The results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs. When deployed with SOTA textual node embedding methods, the transformed multi-graphs enable simple and shallow 2-layer GNN pipelines to achieve results on par with more complex architectures.

Article Classification with Graph Neural Networks and Multigraphs

TL;DR

The paper tackles the challenge of classifying scholarly articles into context-specific taxonomies by enriching Graph Neural Network inputs with multi-graph representations that encode multiple signals of relatedness. It combines References, Authorship, Source, and Subject Area edge types with state-of-the-art LM-based textual embeddings (SimTG, TAPE) and applies an R-GCN transformation to enable heterogeneous inputs, showing consistent performance gains across OGBN-arXiv and PubMed diabetes. The main contributions are a data-driven multi-graph construction methodology, an extensive ablation study identifying robust edge-type configurations, and evidence that simple 2-layer GNNs can achieve competitive results with SOTA methods when fed enriched graphs. This approach offers a scalable, architecture-lean path to improved article classification using readily available metadata and lightweight models, with reproducibility resources provided.

Abstract

Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedness, e.g. references, co-authorship, shared publication source, shared subject headings, as distinct edge types. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph and PubMed Central, respectively. The results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs. When deployed with SOTA textual node embedding methods, the transformed multi-graphs enable simple and shallow 2-layer GNN pipelines to achieve results on par with more complex architectures.
Paper Structure (10 sections, 2 figures, 5 tables)

This paper contains 10 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Illustration of the proposed multi-graph input, which enables the neighboring feature aggregation for a node $X_1$ to be performed across a variety of subgraphs, leveraging multiple signals of article relatedness (References, Authorship, and shared Journal depicted here).
  • Figure 2: Degree distribution, i.e. frequency of each degree value, of all subgraphs for OGBN-arXiv (left) and PubMed (right), plotted on a log-log scale. Points indicate the unique degree values.