Multi-label Node Classification On Graph-Structured Data

Tianqi Zhao; Ngan Thi Dong; Alan Hanjalic; Megha Khosla

Multi-label Node Classification On Graph-Structured Data

Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla

TL;DR

This work tackles multi-label node classification on graph-structured data, highlighting the scarcity of public datasets and the distinct semantics of homophily in multi-label contexts. It provides three real-world biological datasets and a synthetic generator with tunable properties, plus a framework for analyzing homophily and Cross-Class Neighborhood Similarity (CCNS), across nine datasets and eight methods. Large-scale experiments reveal that simple baselines can outperform some GNNs on several datasets and that conventional AUROC evaluation can be misleading in sparse, multi-label settings, motivating the use of Average Precision. The authors publicly release a comprehensive benchmark to advance standardized evaluation in multi-label graph learning.

Abstract

Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected $9$ multi-label datasets. Finally, we perform a large-scale comparative study with $8$ methods and $9$ datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

Multi-label Node Classification On Graph-Structured Data

TL;DR

Abstract

multi-label datasets. Finally, we perform a large-scale comparative study with

methods and

datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

Paper Structure (28 sections, 5 equations, 16 figures, 10 tables)

This paper contains 28 sections, 5 equations, 16 figures, 10 tables.

Introduction
Background and Related Work
Notations and The Problem Setting.
Related Work
Multi-label Classification On Graph-structured Data
Multi-label Classification On Non-Graph-Structured Data Using GNNs
A detailed analysis of existing and new datasets
Existing datasets
New biological interaction datasets
Multi-label Graph Generator Framework
Experiments
Results and Discussion
Results on real-world datasets.
Results on synthetic datasets
Effect of varying feature quality.
...and 13 more sections

Figures (16)

Figure 1: Label distributions. In BlogCat, the majority of the nodes have one label. In OGB-Proteins, around $41$% of total nodes have no labels, and only three nodes have the maximum number of $100$ labels.
Figure 2: Cross class Neighborhood Similarity in real-world datasets
Figure 3: Label distributions in biological datasets. The majority of the nodes in all datasets have one label.
Figure 4: Cross class Neighborhood Similarity in real-world datasets and proposed biological datasets
Figure 5: Cross-class Neighborhood Similarity in hypersphere datasets with varying label homophily
...and 11 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

Multi-label Node Classification On Graph-Structured Data

TL;DR

Abstract

Multi-label Node Classification On Graph-Structured Data

Authors

TL;DR

Abstract

Table of Contents

Figures (16)

Theorems & Definitions (2)