Table of Contents
Fetching ...

Molecular Graph Contrastive Learning with Line Graph

Xueyuan Chen, Shangzhe Li, Ruomei Liu, Bowen Shi, Jiaheng Liu, Junran Wu, Ke Xu

TL;DR

This work tackles label scarcity in molecular property prediction by proposing LEMON, a line-graph-based graph contrastive learning framework. By contrasting a molecular graph with its line graph using a dual-helix encoder and edge attribute fusion, LEMON preserves molecular semantics without heavy domain knowledge. It introduces intra-local and inter-local contrastive losses to mitigate over-smoothing and address hard negatives, and demonstrates state-of-the-art transfer learning performance on eight MoleculeNet benchmarks with a 2M-molecule ZINC15 pre-training. The approach offers a scalable, domain-knowledge-light paradigm for unsupervised molecular representation learning with practical impact for drug discovery and material design.

Abstract

Trapped by the label scarcity in molecular property prediction and drug design, graph contrastive learning (GCL) came forward. Leading contrastive learning works show two kinds of view generators, that is, random or learnable data corruption and domain knowledge incorporation. While effective, the two ways also lead to molecular semantics altering and limited generalization capability, respectively. To this end, we relate the \textbf{L}in\textbf{E} graph with \textbf{MO}lecular graph co\textbf{N}trastive learning and propose a novel method termed \textit{LEMON}. Specifically, by contrasting the given graph with the corresponding line graph, the graph encoder can freely encode the molecular semantics without omission. Furthermore, we present a new patch with edge attribute fusion and two local contrastive losses enhance information transmission and tackle hard negative samples. Compared with state-of-the-art (SOTA) methods for view generation, superior performance on molecular property prediction suggests the effectiveness of our proposed framework.

Molecular Graph Contrastive Learning with Line Graph

TL;DR

This work tackles label scarcity in molecular property prediction by proposing LEMON, a line-graph-based graph contrastive learning framework. By contrasting a molecular graph with its line graph using a dual-helix encoder and edge attribute fusion, LEMON preserves molecular semantics without heavy domain knowledge. It introduces intra-local and inter-local contrastive losses to mitigate over-smoothing and address hard negatives, and demonstrates state-of-the-art transfer learning performance on eight MoleculeNet benchmarks with a 2M-molecule ZINC15 pre-training. The approach offers a scalable, domain-knowledge-light paradigm for unsupervised molecular representation learning with practical impact for drug discovery and material design.

Abstract

Trapped by the label scarcity in molecular property prediction and drug design, graph contrastive learning (GCL) came forward. Leading contrastive learning works show two kinds of view generators, that is, random or learnable data corruption and domain knowledge incorporation. While effective, the two ways also lead to molecular semantics altering and limited generalization capability, respectively. To this end, we relate the \textbf{L}in\textbf{E} graph with \textbf{MO}lecular graph co\textbf{N}trastive learning and propose a novel method termed \textit{LEMON}. Specifically, by contrasting the given graph with the corresponding line graph, the graph encoder can freely encode the molecular semantics without omission. Furthermore, we present a new patch with edge attribute fusion and two local contrastive losses enhance information transmission and tackle hard negative samples. Compared with state-of-the-art (SOTA) methods for view generation, superior performance on molecular property prediction suggests the effectiveness of our proposed framework.
Paper Structure (20 sections, 11 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Framework overview of LEMON. Contrasted views consist of the original graph and the corresponding line graph. Input graphs are encoded by a dual-helix graph encoder with edge attribute fusion for information consistency. The whole model is jointly optimized via minimizing the NT-Xent loss and the two local contrastive losses.
  • Figure 1: t-SNE visualization of the graph embedding on BBBP.
  • Figure 2: An illustration of line graph transformation. (a) shows a simple undirected graph $G$; (b) reveals the derivation of vertices in line graph, every vertex of line graph is marked with green and labeled with the pair nodes of the corresponding edge in $G$; (c) establishes the associations in $L(G)$ according to the common nodes owned by two edges; (d) delivers the output line graph $L(G)$ after transformation.
  • Figure 2: t-SNE visualization of the graph embedding on BACE.
  • Figure 3: Illustration of hard negative samples. Via contrasting the graph embeddings, the pre-trained model is hard to distinguish this two kinds of graphs.
  • ...and 4 more figures