Graph Contrastive Learning under Heterophily via Graph Filters
Wenhan Yang, Baharan Mirzasoleiman
TL;DR
The paper tackles graph contrastive learning on graphs with heterophily, where neighboring nodes may belong to different classes. It introduces HLCL, a dual-filter self-supervised framework that first splits the graph into homophilic and heterophilic subgraphs based on feature similarity, then applies low-pass filtering to the homophilic part and high-pass filtering to the heterophilic part before contrasting their augmented views with a shared encoder. The authors provide theoretical analysis showing HLCL encodes both low- and high-frequency information and demonstrate empirical gains over state-of-the-art CL methods and competitive performance with supervised methods on heterophilic datasets, with scalability to large graphs. They also conduct extensive ablations to justify the dual-filter design and discuss limitations, including scenarios where feature signals fail to discriminate labels.
Abstract
Graph contrastive learning (CL) methods learn node representations in a self-supervised manner by maximizing the similarity between the augmented node representations obtained via a GNN-based encoder. However, CL methods perform poorly on graphs with heterophily, where connected nodes tend to belong to different classes. In this work, we address this problem by proposing an effective graph CL method, namely HLCL, for learning graph representations under heterophily. HLCL first identifies a homophilic and a heterophilic subgraph based on the cosine similarity of node features. It then uses a low-pass and a high-pass graph filter to aggregate representations of nodes connected in the homophilic subgraph and differentiate representations of nodes in the heterophilic subgraph. The final node representations are learned by contrasting both the augmented high-pass filtered views and the augmented low-pass filtered node views. Our extensive experiments show that HLCL outperforms state-of-the-art graph CL methods on benchmark datasets with heterophily, as well as large-scale real-world graphs, by up to 7%, and outperforms graph supervised learning methods on datasets with heterophily by up to 10%.
