Posterior Label Smoothing for Node Classification
Jaeseung Heo, Moonjeong Park, Dongwoo Kim
TL;DR
PosteL tackles node classification on graphs with varying levels of homophily and heterophily by deriving soft labels from a posterior over node labels conditioned on neighbor labels. The posterior is computed using a likelihood based on the product of neighborhood label conditionals and a prior from global label frequencies, with iterative pseudo-labeling refining these statistics across training rounds. Across 10 datasets and 8 backbone models, PosteL yields robust improvements, notably strong gains on heterophilic graphs and a substantial 14.43% boost on Cornell with GCN, while maintaining competitive performance on homophilic graphs. The approach offers a practical, scalable soft-label regularization that mitigates overfitting and enhances generalization, with code available for replication.
Abstract
Label smoothing is a widely studied regularization technique in machine learning. However, its potential for node classification in graph-structured data, spanning homophilic to heterophilic graphs, remains largely unexplored. We introduce posterior label smoothing, a novel method for transductive node classification that derives soft labels from a posterior distribution conditioned on neighborhood labels. The likelihood and prior distributions are estimated from the global statistics of the graph structure, allowing our approach to adapt naturally to various graph properties. We evaluate our method on 10 benchmark datasets using eight baseline models, demonstrating consistent improvements in classification accuracy. The following analysis demonstrates that soft labels mitigate overfitting during training, leading to better generalization performance, and that pseudo-labeling effectively refines the global label statistics of the graph. Our code is available at https://github.com/ml-postech/PosteL.
