Table of Contents
Fetching ...

Representational Alignment with Chemical Induced Fit for Molecular Relational Learning

Peiliang Zhang, Jingling Yuan, Qing Xie, Yongjun Zhu, Lin Li

TL;DR

This work targets instability in Molecular Relational Learning caused by attention-based inductive biases that lack chemical-domain guidance. It introduces ReAlignFit, a chemistry-informed framework that combines SRIN for substructure encoding with a Dynamic Representational Alignment Module (DRAM) that incorporates a Bias Correction Function and Subgraph Information Bottleneck to dynamically align core substructures during induced-fit-like interactions. Theoretical analysis links stability to core-confounding substructure separation, and the model optimizes a loss incorporating prediction and calibrated mutual-information terms. Empirical results on nine datasets across MI and DDI tasks show improved predictive performance and, critically, enhanced stability under rule-shifted and scaffold-shifted distributions. This approach demonstrates the practical potential of domain-guided, dynamic representational alignment for robust molecular reasoning.

Abstract

Molecular Relational Learning (MRL) is widely applied in natural sciences to predict relationships between molecular pairs by extracting structural features. The representational similarity between substructure pairs determines the functional compatibility of molecular binding sites. Nevertheless, aligning substructure representations by attention mechanisms lacks guidance from chemical knowledge, resulting in unstable model performance in chemical space (\textit{e.g.}, functional group, scaffold) shifted data. With theoretical justification, we propose the \textbf{Re}presentational \textbf{Align}ment with Chemical Induced \textbf{Fit} (ReAlignFit) to enhance the stability of MRL. ReAlignFit dynamically aligns substructure representation in MRL by introducing chemical Induced Fit-based inductive bias. In the induction process, we design the Bias Correction Function based on substructure edge reconstruction to align representations between substructure pairs by simulating chemical conformational changes (dynamic combination of substructures). ReAlignFit further integrates the Subgraph Information Bottleneck during fit process to refine and optimize substructure pairs exhibiting high chemical functional compatibility, leveraging them to generate molecular embeddings. Experimental results on nine datasets demonstrate that ReAlignFit outperforms state-of-the-art models in two tasks and significantly enhances model's stability in both rule-shifted and scaffold-shifted data distributions.

Representational Alignment with Chemical Induced Fit for Molecular Relational Learning

TL;DR

This work targets instability in Molecular Relational Learning caused by attention-based inductive biases that lack chemical-domain guidance. It introduces ReAlignFit, a chemistry-informed framework that combines SRIN for substructure encoding with a Dynamic Representational Alignment Module (DRAM) that incorporates a Bias Correction Function and Subgraph Information Bottleneck to dynamically align core substructures during induced-fit-like interactions. Theoretical analysis links stability to core-confounding substructure separation, and the model optimizes a loss incorporating prediction and calibrated mutual-information terms. Empirical results on nine datasets across MI and DDI tasks show improved predictive performance and, critically, enhanced stability under rule-shifted and scaffold-shifted distributions. This approach demonstrates the practical potential of domain-guided, dynamic representational alignment for robust molecular reasoning.

Abstract

Molecular Relational Learning (MRL) is widely applied in natural sciences to predict relationships between molecular pairs by extracting structural features. The representational similarity between substructure pairs determines the functional compatibility of molecular binding sites. Nevertheless, aligning substructure representations by attention mechanisms lacks guidance from chemical knowledge, resulting in unstable model performance in chemical space (\textit{e.g.}, functional group, scaffold) shifted data. With theoretical justification, we propose the \textbf{Re}presentational \textbf{Align}ment with Chemical Induced \textbf{Fit} (ReAlignFit) to enhance the stability of MRL. ReAlignFit dynamically aligns substructure representation in MRL by introducing chemical Induced Fit-based inductive bias. In the induction process, we design the Bias Correction Function based on substructure edge reconstruction to align representations between substructure pairs by simulating chemical conformational changes (dynamic combination of substructures). ReAlignFit further integrates the Subgraph Information Bottleneck during fit process to refine and optimize substructure pairs exhibiting high chemical functional compatibility, leveraging them to generate molecular embeddings. Experimental results on nine datasets demonstrate that ReAlignFit outperforms state-of-the-art models in two tasks and significantly enhances model's stability in both rule-shifted and scaffold-shifted data distributions.

Paper Structure

This paper contains 32 sections, 3 theorems, 29 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Theorem 2.1

Given the molecular pair $({\mathcal{G}_{x}},{\mathcal{G}_{y}})$ and the prediction target $\mathcal{Y}$, where the substructure ${\mathcal{G}^{s}}$ of $\mathcal{G}$ consists of core substructure ${\mathcal{G}^{c}}$ and confounding substructure ${\mathcal{G}^{n}}$. For $\forall$${\mathcal{G}_{x}},{\ where $\mathcal{P}({\mathcal{G}_{x}},{\mathcal{G}_{y}};\mathcal{Y})$ is the true probability betwee

Figures (7)

  • Figure 1: The motivating example. (a) When molecule A reacts with molecules B and C, the core substructures are -COOH and -NH2, respectively. (b) The properties of -CO within molecular E are influenced by surrounding reactive atoms -CI, leading to changes in its behavior in chemical reactions.
  • Figure 2: The model structure of ReAlignFit. (a) SRIN generates substructure representations. (b) DRAM aligns and optimizes the core substructure representations to generate stable representations of molecules.
  • Figure 3: The performance and RPD of ReAlignFit, CGIB and CIGIN in different data distributions.
  • Figure 4: The experimental results of ablation experiment.
  • Figure 5: The experimental results of confusion analysis in HetionteDDI dataset.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 2.1
  • Definition 3.1
  • Proposition 3.1
  • Proposition 3.2