NodeReg: Mitigating the Imbalance and Distribution Shift Effects in Semi-Supervised Node Classification via Norm Consistency

Shenzhi Yang; Jun Xia; Jingbo Zhou; Xingkai Yao; Xiaofang Zhang

NodeReg: Mitigating the Imbalance and Distribution Shift Effects in Semi-Supervised Node Classification via Norm Consistency

Shenzhi Yang, Jun Xia, Jingbo Zhou, Xingkai Yao, Xiaofang Zhang

TL;DR

This work addresses the fragility of semi-supervised node classification under node-label imbalance and distribution shift by revealing a link between norm imbalance of node representations and degraded generalization. It introduces NodeReg, a simple regularizer that enforces consistent node representation norms through a smoothed, Lipschitz-continuous penalty added to the cross-entropy loss, with barF and δ_v terms. The authors provide theoretical guarantees on Lipschitz continuity and gradient smoothness, and connect norm consistency to reduced generalization gap via benign overfitting and neural collapse perspectives. Empirically, NodeReg yields substantial improvements over strong baselines on imbalance tasks (F1 gains up to 25.9% across five datasets) and modest but consistent gains under distribution shifts, while also proving more robust to noise and offering favorable training stability. These results suggest NodeReg as a practical, principled component for robust semi-supervised node classification.

Abstract

Aggregating information from neighboring nodes benefits graph neural networks (GNNs) in semi-supervised node classification tasks. Nevertheless, this mechanism also renders nodes susceptible to the influence of their neighbors. For instance, this will occur when the neighboring nodes are imbalanced or the neighboring nodes contain noise, which can even affect the GNN's ability to generalize out of distribution. We find that ensuring the consistency of the norm for node representations can significantly reduce the impact of these two issues on GNNs. To this end, we propose a regularized optimization method called NodeReg that enforces the consistency of node representation norms. This method is simple but effective and satisfies Lipschitz continuity, thus facilitating stable optimization and significantly improving semi-supervised node classification performance under the above two scenarios. To illustrate, in the imbalance scenario, when training a GCN with an imbalance ratio of 0.1, NodeReg outperforms the most competitive baselines by 1.4%-25.9% in F1 score across five public datasets. Similarly, in the distribution shift scenario, NodeReg outperforms the most competitive baseline by 1.4%-3.1% in accuracy.

NodeReg: Mitigating the Imbalance and Distribution Shift Effects in Semi-Supervised Node Classification via Norm Consistency

TL;DR

Abstract

NodeReg: Mitigating the Imbalance and Distribution Shift Effects in Semi-Supervised Node Classification via Norm Consistency

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)