Article Classification with Graph Neural Networks and Multigraphs

Khang Ly; Yury Kashnitsky; Savvas Chamezopoulos; Valeria Krzhizhanovskaya

Article Classification with Graph Neural Networks and Multigraphs

Khang Ly, Yury Kashnitsky, Savvas Chamezopoulos, Valeria Krzhizhanovskaya

TL;DR

The paper tackles the challenge of classifying scholarly articles into context-specific taxonomies by enriching Graph Neural Network inputs with multi-graph representations that encode multiple signals of relatedness. It combines References, Authorship, Source, and Subject Area edge types with state-of-the-art LM-based textual embeddings (SimTG, TAPE) and applies an R-GCN transformation to enable heterogeneous inputs, showing consistent performance gains across OGBN-arXiv and PubMed diabetes. The main contributions are a data-driven multi-graph construction methodology, an extensive ablation study identifying robust edge-type configurations, and evidence that simple 2-layer GNNs can achieve competitive results with SOTA methods when fed enriched graphs. This approach offers a scalable, architecture-lean path to improved article classification using readily available metadata and lightweight models, with reproducibility resources provided.

Abstract

Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedness, e.g. references, co-authorship, shared publication source, shared subject headings, as distinct edge types. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph and PubMed Central, respectively. The results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs. When deployed with SOTA textual node embedding methods, the transformed multi-graphs enable simple and shallow 2-layer GNN pipelines to achieve results on par with more complex architectures.

Article Classification with Graph Neural Networks and Multigraphs

TL;DR

Abstract

Article Classification with Graph Neural Networks and Multigraphs

Authors

TL;DR

Abstract

Table of Contents

Figures (2)