Table of Contents
Fetching ...

MolGA: Molecular Graph Adaptation with Pre-trained 2D Graph Encoder

Xingtong Yu, Chang Zhou, Xinming Zhang, Yuan Fang

TL;DR

MolGA tackles the challenge of reusing pre-trained 2D graph encoders for molecular tasks by introducing a two-stage approach: molecular alignment to bridge topological and domain-knowledge representations, and instance-specific adaptation via conditional networks that generate tokens to modulate atom and bond information. It employs rule-based extractors to obtain domain knowledge, projection nets to map them into the embedding space, and a contrastive loss to align modalities, followed by conditional networks that tailor representations at the submolecular level while keeping the encoder frozen. This design yields a parameter-efficient framework that outperforms strong baselines across eleven benchmarks, demonstrating the value of integrating diverse molecular knowledge during downstream adaptation. The approach offers practical benefits by reusing robust 2D encoders and enabling fine-grained, knowledge-infused representations without large-scale pre-training.

Abstract

Molecular graph representation learning is widely used in chemical and biomedical research. While pre-trained 2D graph encoders have demonstrated strong performance, they overlook the rich molecular domain knowledge associated with submolecular instances (atoms and bonds). While molecular pre-training approaches incorporate such knowledge into their pre-training objectives, they typically employ designs tailored to a specific type of knowledge, lacking the flexibility to integrate diverse knowledge present in molecules. Hence, reusing widely available and well-validated pre-trained 2D encoders, while incorporating molecular domain knowledge during downstream adaptation, offers a more practical alternative. In this work, we propose MolGA, which adapts pre-trained 2D graph encoders to downstream molecular applications by flexibly incorporating diverse molecular domain knowledge. First, we propose a molecular alignment strategy that bridge the gap between pre-trained topological representations with domain-knowledge representations. Second, we introduce a conditional adaptation mechanism that generates instance-specific tokens to enable fine-grained integration of molecular domain knowledge for downstream tasks. Finally, we conduct extensive experiments on eleven public datasets, demonstrating the effectiveness of MolGA.

MolGA: Molecular Graph Adaptation with Pre-trained 2D Graph Encoder

TL;DR

MolGA tackles the challenge of reusing pre-trained 2D graph encoders for molecular tasks by introducing a two-stage approach: molecular alignment to bridge topological and domain-knowledge representations, and instance-specific adaptation via conditional networks that generate tokens to modulate atom and bond information. It employs rule-based extractors to obtain domain knowledge, projection nets to map them into the embedding space, and a contrastive loss to align modalities, followed by conditional networks that tailor representations at the submolecular level while keeping the encoder frozen. This design yields a parameter-efficient framework that outperforms strong baselines across eleven benchmarks, demonstrating the value of integrating diverse molecular knowledge during downstream adaptation. The approach offers practical benefits by reusing robust 2D encoders and enabling fine-grained, knowledge-infused representations without large-scale pre-training.

Abstract

Molecular graph representation learning is widely used in chemical and biomedical research. While pre-trained 2D graph encoders have demonstrated strong performance, they overlook the rich molecular domain knowledge associated with submolecular instances (atoms and bonds). While molecular pre-training approaches incorporate such knowledge into their pre-training objectives, they typically employ designs tailored to a specific type of knowledge, lacking the flexibility to integrate diverse knowledge present in molecules. Hence, reusing widely available and well-validated pre-trained 2D encoders, while incorporating molecular domain knowledge during downstream adaptation, offers a more practical alternative. In this work, we propose MolGA, which adapts pre-trained 2D graph encoders to downstream molecular applications by flexibly incorporating diverse molecular domain knowledge. First, we propose a molecular alignment strategy that bridge the gap between pre-trained topological representations with domain-knowledge representations. Second, we introduce a conditional adaptation mechanism that generates instance-specific tokens to enable fine-grained integration of molecular domain knowledge for downstream tasks. Finally, we conduct extensive experiments on eleven public datasets, demonstrating the effectiveness of MolGA.

Paper Structure

This paper contains 24 sections, 11 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Motivation of MolGA. (a) 2D topological and molecular domain knowledge in molecules. (b) Comparison of pre-training and downstream adaptation settings in 2D pre-training methods, molecular pre-training approaches and MolGA.
  • Figure 2: Overall framework of MolGA. Building upon an existing pre-trained 2D encoder, MolGA performs molecular alignment and adaptation for downstream tasks.
  • Figure 3: Impact of labeled data size (number of shots) on molecular classification.
  • Figure 4: Impact of hidden dimension $s$ in the conditional networks.
  • Figure 5: Visualization of embedding space of atoms.