Molecular Graph Contrastive Learning with Line Graph
Xueyuan Chen, Shangzhe Li, Ruomei Liu, Bowen Shi, Jiaheng Liu, Junran Wu, Ke Xu
TL;DR
This work tackles label scarcity in molecular property prediction by proposing LEMON, a line-graph-based graph contrastive learning framework. By contrasting a molecular graph with its line graph using a dual-helix encoder and edge attribute fusion, LEMON preserves molecular semantics without heavy domain knowledge. It introduces intra-local and inter-local contrastive losses to mitigate over-smoothing and address hard negatives, and demonstrates state-of-the-art transfer learning performance on eight MoleculeNet benchmarks with a 2M-molecule ZINC15 pre-training. The approach offers a scalable, domain-knowledge-light paradigm for unsupervised molecular representation learning with practical impact for drug discovery and material design.
Abstract
Trapped by the label scarcity in molecular property prediction and drug design, graph contrastive learning (GCL) came forward. Leading contrastive learning works show two kinds of view generators, that is, random or learnable data corruption and domain knowledge incorporation. While effective, the two ways also lead to molecular semantics altering and limited generalization capability, respectively. To this end, we relate the \textbf{L}in\textbf{E} graph with \textbf{MO}lecular graph co\textbf{N}trastive learning and propose a novel method termed \textit{LEMON}. Specifically, by contrasting the given graph with the corresponding line graph, the graph encoder can freely encode the molecular semantics without omission. Furthermore, we present a new patch with edge attribute fusion and two local contrastive losses enhance information transmission and tackle hard negative samples. Compared with state-of-the-art (SOTA) methods for view generation, superior performance on molecular property prediction suggests the effectiveness of our proposed framework.
