Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Rui Miao; Kaixiong Zhou; Yili Wang; Ninghao Liu; Ying Wang; Xin Wang

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Rui Miao, Kaixiong Zhou, Yili Wang, Ninghao Liu, Ying Wang, Xin Wang

TL;DR

This work addresses the mismatch between graph-structured data and the independent cross-entropy loss by introducing joint-cluster supervised learning, which models the joint distribution $p(y_i, ar{y}_m | z_i, ar{z}_m; \theta)$ between a node and its cluster. It trains end-to-end with a joint-cluster cross-entropy loss that uses cluster-level reference signals and infers node labels via marginalization over the cluster dimension, thereby reducing over-confident predictions and improving robustness. Extensive experiments across small and large graphs, including imbalanced and heterophilic settings, show consistent accuracy gains and enhanced resilience to adversarial attacks, with favorable efficiency relative to CRF-based approaches. The method leverages METIS clustering to capture community structure and demonstrates calibration improvements, offering a scalable path to more reliable GNNs in real-world graph applications.

Abstract

Graph neural networks (GNNs) have exhibited prominent performance in learning graph-structured data. Considering node classification task, based on the i.i.d assumption among node labels, the traditional supervised learning simply sums up cross-entropy losses of the independent training nodes and applies the average loss to optimize GNNs' weights. But different from other data formats, the nodes are naturally connected. It is found that the independent distribution modeling of node labels restricts GNNs' capability to generalize over the entire graph and defend adversarial attacks. In this work, we propose a new framework, termed joint-cluster supervised learning, to model the joint distribution of each node with its corresponding cluster. We learn the joint distribution of node and cluster labels conditioned on their representations, and train GNNs with the obtained joint loss. In this way, the data-label reference signals extracted from the local cluster explicitly strengthen the discrimination ability on the target node. The extensive experiments demonstrate that our joint-cluster supervised learning can effectively bolster GNNs' node classification accuracy. Furthermore, being benefited from the reference signals which may be free from spiteful interference, our learning paradigm significantly protects the node classification from being affected by the adversarial attack.

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

TL;DR

This work addresses the mismatch between graph-structured data and the independent cross-entropy loss by introducing joint-cluster supervised learning, which models the joint distribution

between a node and its cluster. It trains end-to-end with a joint-cluster cross-entropy loss that uses cluster-level reference signals and infers node labels via marginalization over the cluster dimension, thereby reducing over-confident predictions and improving robustness. Extensive experiments across small and large graphs, including imbalanced and heterophilic settings, show consistent accuracy gains and enhanced resilience to adversarial attacks, with favorable efficiency relative to CRF-based approaches. The method leverages METIS clustering to capture community structure and demonstrates calibration improvements, offering a scalable path to more reliable GNNs in real-world graph applications.

Abstract

Paper Structure (46 sections, 6 equations, 4 figures, 19 tables, 1 algorithm)

This paper contains 46 sections, 6 equations, 4 figures, 19 tables, 1 algorithm.

Introduction
Preliminary of GNNs and Supervised Learning
Graph Neural Networks
Independent Cross-entropy Loss
Joint-cluster Supervised Learning
Training with joint-cluster loss.
Node class inference in joint distribution.
Related Work
Label propagation.
Conditional random fields.
Experiments
Evaluation on Small Graph Datasets
Implementation.
Q: Whether our proposals outperform the standard supervised learning on the easy and small datasets?
Evaluation on Large Graph Datasets
...and 31 more sections

Figures (4)

Figure 1: An illustration of our joint-cluster supervised learning framework: First, we obtain node embeddings through the encoder. Then the cluster embedding and label are generated through the divided graph structure. Then the node embedding and the cluster embedding are concatenated and fed into the classifier to obtain a joint distribution prediction. Finally, the joint-cluster loss and marginalization are used for training and inference.
Figure 2: Left, Middle: Node representation visualization by t-SNE tsne for 8-layer GCN trained by CE loss (left) and JC loss (middle) on Cora. Right: Normalized comparison of the gap between train and test losses on ogbn-arxiv.
Figure 3: Hyperparameter effect of the cluster number in the joint-cluster supervised learning. Note that a/b in Yelp, a denotes cluster number in Cluster_GCN and GraphSAGE, and b represents cluster number in SIGN, which uses a larger batch size.
Figure 4: Over-smoothing analysis about the model depth for node classification.

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

TL;DR

Abstract

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Authors

TL;DR

Abstract

Table of Contents

Figures (4)