Table of Contents
Fetching ...

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

Xiangzhe Kong, Wenbing Huang, Yang Liu

TL;DR

The paper tackles cross-domain 3D molecular interaction learning by proposing a unified geometric graph of sets to represent complexes and a Generalist Equivariant Transformer (GET) that processes matrix-form, variable-size block and atom features with $E(3)$-equivariance. GET combines a bilevel attention mechanism, an equivariant feed-forward network, and equivariant layer normalization to preserve fine-grained geometric information across blocks, enabling simultaneous modeling of intra-block and inter-block interactions. The authors demonstrate superior performance over domain-specific and vanilla unified baselines on protein–protein, protein–ligand, and RNA/DNA–ligand affinity tasks, and show robust cross-domain generalization including zero-shot predictions on unseen domains. These results indicate a pathway toward universal molecular representation learning that leverages shared interaction physics across domains, with practical impact for drug discovery and biomolecular engineering.

Abstract

Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

TL;DR

The paper tackles cross-domain 3D molecular interaction learning by proposing a unified geometric graph of sets to represent complexes and a Generalist Equivariant Transformer (GET) that processes matrix-form, variable-size block and atom features with -equivariance. GET combines a bilevel attention mechanism, an equivariant feed-forward network, and equivariant layer normalization to preserve fine-grained geometric information across blocks, enabling simultaneous modeling of intra-block and inter-block interactions. The authors demonstrate superior performance over domain-specific and vanilla unified baselines on protein–protein, protein–ligand, and RNA/DNA–ligand affinity tasks, and show robust cross-domain generalization including zero-shot predictions on unseen domains. These results indicate a pathway toward universal molecular representation learning that leverages shared interaction physics across domains, with practical impact for drug discovery and biomolecular engineering.

Abstract

Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.
Paper Structure (45 sections, 5 theorems, 24 equations, 6 figures, 16 tables)

This paper contains 45 sections, 5 theorems, 24 equations, 6 figures, 16 tables.

Key Result

Theorem 3.1

Denote the proposed Equivariant Transformer as $\{{\bm{H}}_i', \vec{{\bm{X}}}_i'\} = \mathrm{GET}(\{{\bm{H}}_i, \vec{{\bm{X}}}_i\})$, then it conforms to E(3)-Equivariance and Intra-Block Permutation Invariance. Namely, $\forall g \in \text{E(3)}, \forall \{\pi_i \in S_{n_i} | 1 \leq i \leq B\}$, wh

Figures (6)

  • Figure 1: Domain-specific representations and unified representations in molecular interaction.
  • Figure 2: Overview of the unified representation and the equivariant modules in our Generalist Equivariant Transformer (GET). From left to right: The unified representation treats molecules as geometric graphs of sets according to predefined building blocks; The bilevel attention module captures both sparse block-level and dense atom-level interactions via an equivariant attention mechanism; The feed-forward network injects the block-level information into the intra-block atoms; The layer normalization transforms the input distribution with trainable scales and offsets.
  • Figure 3: The scheme of a layer of Generalist Equivariant Transformer, where block $i$ ($\bm{H}_i$, $\vec{\bm{X}}_i$) is updated by its neighbors ($\{\bm{H}_j\}$, $\{\vec{\bm{X}}_j\}$, $j\in {\mathcal{N}}_b(i)$). $\times$, $+$, and $\oplus$ denote multiplication, addition and concatenation, respectively. (Left) The overall workflow of a layer and details of the block-level attention. (Right) The details of the atom-level cross attention. GET is composed of $N$ such layers.
  • Figure 4: Performance with respect to the dimensions of the hidden layers (left) and the number of layers (right) on protein-protein affinity (PPA) and ligand-binding affinity (LBA).
  • Figure 5: Performance (Pearson Correlation) with respect to the number of nearest neighbors on protein-protein affinity (PPA) and ligand-binding affinity (LBA).
  • ...and 1 more figures

Theorems & Definitions (10)

  • Theorem 3.1: E(3)-Equivariance and Intra-Block Permutation Invariance
  • Definition 3.2: E(3)-equivariance
  • Lemma 3.3
  • proof
  • Lemma 3.4
  • proof
  • Lemma 3.5
  • proof
  • Lemma 3.6
  • proof