Table of Contents
Fetching ...

Pre-training Graph Neural Networks on 2D and 3D Molecular Structures by using Multi-View Conditional Information Bottleneck

Van Thuy Hoang, O-Joun Lee

TL;DR

This work tackles multi-view molecular representation learning by pre-training graph neural networks on paired 2D and 3D molecular structures. It introduces the Multi-View Conditional Information Bottleneck (MVCIB) to maximize shared information across views while reducing view-specific noise, and employs cross-view subgraph alignment anchored on functional groups and ego-networks with a cross-attention mechanism. The approach combines subgraph sampling, 2D/3D encoders, and a suite of self-supervised objectives—including conditional information bottleneck, reconstruction, and cross-view signals—to achieve superior predictive performance, improved interpretability, and enhanced 3D geometry expressiveness, as demonstrated on MoleculeNet, QM9, and isomer-distinguishing tasks. MVCIB notably advances the ability to distinguish isomers and to capture high-order chemical substructures across views, offering a robust, scalable pre-training paradigm for molecular graphs with practical implications for drug discovery and materials science.

Abstract

Recent pre-training strategies for molecular graphs have attempted to use 2D and 3D molecular views as both inputs and self-supervised signals, primarily aligning graph-level representations. However, existing studies remain limited in addressing two main challenges of multi-view molecular learning: (1) discovering shared information between two views while diminishing view-specific information and (2) identifying and aligning important substructures, e.g., functional groups, which are crucial for enhancing cross-view consistency and model expressiveness. To solve these challenges, we propose a Multi-View Conditional Information Bottleneck framework, called MVCIB, for pre-training graph neural networks on 2D and 3D molecular structures in a self-supervised setting. Our idea is to discover the shared information while minimizing irrelevant features from each view under the MVCIB principle, which uses one view as a contextual condition to guide the representation learning of its counterpart. To enhance semantic and structural consistency across views, we utilize key substructures, e.g., functional groups and ego-networks, as anchors between the two views. Then, we propose a cross-attention mechanism that captures fine-grained correlations between the substructures to achieve subgraph alignment across views. Extensive experiments in four molecular domains demonstrated that MVCIB consistently outperforms baselines in both predictive performance and interpretability. Moreover, MVCIB achieved the 3d Weisfeiler-Lehman expressiveness power to distinguish not only non-isomorphic graphs but also different 3D geometries that share identical 2D connectivity, such as isomers.

Pre-training Graph Neural Networks on 2D and 3D Molecular Structures by using Multi-View Conditional Information Bottleneck

TL;DR

This work tackles multi-view molecular representation learning by pre-training graph neural networks on paired 2D and 3D molecular structures. It introduces the Multi-View Conditional Information Bottleneck (MVCIB) to maximize shared information across views while reducing view-specific noise, and employs cross-view subgraph alignment anchored on functional groups and ego-networks with a cross-attention mechanism. The approach combines subgraph sampling, 2D/3D encoders, and a suite of self-supervised objectives—including conditional information bottleneck, reconstruction, and cross-view signals—to achieve superior predictive performance, improved interpretability, and enhanced 3D geometry expressiveness, as demonstrated on MoleculeNet, QM9, and isomer-distinguishing tasks. MVCIB notably advances the ability to distinguish isomers and to capture high-order chemical substructures across views, offering a robust, scalable pre-training paradigm for molecular graphs with practical implications for drug discovery and materials science.

Abstract

Recent pre-training strategies for molecular graphs have attempted to use 2D and 3D molecular views as both inputs and self-supervised signals, primarily aligning graph-level representations. However, existing studies remain limited in addressing two main challenges of multi-view molecular learning: (1) discovering shared information between two views while diminishing view-specific information and (2) identifying and aligning important substructures, e.g., functional groups, which are crucial for enhancing cross-view consistency and model expressiveness. To solve these challenges, we propose a Multi-View Conditional Information Bottleneck framework, called MVCIB, for pre-training graph neural networks on 2D and 3D molecular structures in a self-supervised setting. Our idea is to discover the shared information while minimizing irrelevant features from each view under the MVCIB principle, which uses one view as a contextual condition to guide the representation learning of its counterpart. To enhance semantic and structural consistency across views, we utilize key substructures, e.g., functional groups and ego-networks, as anchors between the two views. Then, we propose a cross-attention mechanism that captures fine-grained correlations between the substructures to achieve subgraph alignment across views. Extensive experiments in four molecular domains demonstrated that MVCIB consistently outperforms baselines in both predictive performance and interpretability. Moreover, MVCIB achieved the 3d Weisfeiler-Lehman expressiveness power to distinguish not only non-isomorphic graphs but also different 3D geometries that share identical 2D connectivity, such as isomers.

Paper Structure

This paper contains 48 sections, 26 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: (a) It is difficult to distinguish cis and trans isomers only based on the 2D molecular graphs. (b) The incorporation of a Nitro group (R-${NO}_{2}$) into an Aromatic Hydroxyl group (R-$C_6H_5$, R-$OH$) could change its chemical property.
  • Figure 2: An overall architecture of MVCIB, consisting of (i & ii) subgraph sampling and alignment and (iii) MVCIB principle.
  • Figure 3: Given a target node $v_5$ and its surrounding substructures, (a) the 1d Weisfeiler–Lehman (1d-WL) test fails to distinguish the two non-isomorphic substructures rooted at $v_5$, even when the observation range is increased. (b) Our proposed model, MVCIB, successfully distinguishes the non-isomorphic substructures. When the observation range is up to $2$-hop, the two substructures are different (the edges between nodes at the neighbours of node $v_5$) and can then be distinguished by our proposed model.
  • Figure 4: An efficiency analysis for variants of our proposed model: MVCIB with and without pre-training. The solid and dashed lines are training and validation curves, respectively.
  • Figure 5: Qualitative analysis on functional group detection tasks.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1: IB
  • Definition 2: MVCIB