Table of Contents
Fetching ...

E(3)-equivariant models cannot learn chirality: Field-based molecular generation

Alexandru Dumitrescu, Dani Korpela, Markus Heinonen, Yogesh Verma, Valerii Iakovlev, Vikas Garg, Harri Lähdesmäki

TL;DR

This work shows that $E(3)$-invariant, point-cloud diffusion models cannot distinguish molecular chirality, a crucial factor for drug safety and efficacy. To address this, it introduces Field-based Molecule Generation (FMG), which uses atom and bond density fields on a 3D grid and a diffusion model with reference rotations to generate chiral, geometry-rich molecules. Theoretical results prove the chirality limitation of $E(3)$-invariant parameterizations and demonstrate the impractical $(oldsymbol{O}(n^4))$ feature requirement for chiral-aware SE(3) invariants, while FMG achieves competitive state-of-the-art performance on QM9 and GEOM-Drugs and yields accurate enantiomer distributions. Empirically, FMG demonstrates strong neutrality, robust graph and conformational metrics, and explicit chirality awareness, offering a practical path toward chirality-correct drug-like molecular generation. The approach paves the way for scalable, chirality-sensitive 3D molecular generation by trading full $E(3)$ invariance for deterministic frame alignment and field-based representations.

Abstract

Obtaining the desired effect of drugs is highly dependent on their molecular geometries. Thus, the current prevailing paradigm focuses on 3D point-cloud atom representations, utilizing graph neural network (GNN) parametrizations, with rotational symmetries baked in via E(3) invariant layers. We prove that such models must necessarily disregard chirality, a geometric property of the molecules that cannot be superimposed on their mirror image by rotation and translation. Chirality plays a key role in determining drug safety and potency. To address this glaring issue, we introduce a novel field-based representation, proposing reference rotations that replace rotational symmetry constraints. The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics.

E(3)-equivariant models cannot learn chirality: Field-based molecular generation

TL;DR

This work shows that -invariant, point-cloud diffusion models cannot distinguish molecular chirality, a crucial factor for drug safety and efficacy. To address this, it introduces Field-based Molecule Generation (FMG), which uses atom and bond density fields on a 3D grid and a diffusion model with reference rotations to generate chiral, geometry-rich molecules. Theoretical results prove the chirality limitation of -invariant parameterizations and demonstrate the impractical feature requirement for chiral-aware SE(3) invariants, while FMG achieves competitive state-of-the-art performance on QM9 and GEOM-Drugs and yields accurate enantiomer distributions. Empirically, FMG demonstrates strong neutrality, robust graph and conformational metrics, and explicit chirality awareness, offering a practical path toward chirality-correct drug-like molecular generation. The approach paves the way for scalable, chirality-sensitive 3D molecular generation by trading full invariance for deterministic frame alignment and field-based representations.

Abstract

Obtaining the desired effect of drugs is highly dependent on their molecular geometries. Thus, the current prevailing paradigm focuses on 3D point-cloud atom representations, utilizing graph neural network (GNN) parametrizations, with rotational symmetries baked in via E(3) invariant layers. We prove that such models must necessarily disregard chirality, a geometric property of the molecules that cannot be superimposed on their mirror image by rotation and translation. Chirality plays a key role in determining drug safety and potency. To address this glaring issue, we introduce a novel field-based representation, proposing reference rotations that replace rotational symmetry constraints. The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics.
Paper Structure (79 sections, 3 theorems, 44 equations, 22 figures, 6 tables, 1 algorithm)

This paper contains 79 sections, 3 theorems, 44 equations, 22 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

If $p_\phi$ is an E(3) invariant probability distribution, then $p_\phi(\mathbf{m})=p_\phi(\mathbf{m}')$, where $\mathbf{m}$ and $\mathbf{m}'$ are the $\mathbb{R}^{3\times N}$ positions of enatiomer pairs of molecules with $N$ atoms.

Figures (22)

  • Figure 1: E(3) invariant spatial features (e.g., relative bond angles and distances) are not sufficient to represent chirality. Subfigure a) depicts a general chiral molecule pair, which cannot be superimposed by rotations. b) Chiral S-ibuprofen is an efficient COX-inhibitor resulting in pain and inflammation relief, while the mirror R-ibuprofen is not evans2001.
  • Figure 2: a) Atom $\mathbf{u}_a$ and bond $\mathbf{u}_b$ fields noise schedules. b) Sampling illustration. A subset of 200 3D locations with the highest values are shown for all $\mathbf{u}_a$ and $\mathbf{u}_b$ channels. c) Extracting atoms and bonds from the resulting $\mathbf{u}_a$ and $\mathbf{u}_b$ fields at $t=0$. d) Visualizing the optimized atoms and bonds.
  • Figure 3: Kernel density estimation of $\det {\bm{R}}$, resulted from the data and generated molecules. EDM is unable to distinguish and correctly generate the enantiomer distribution, while the E(3) variant method (GNN) generates improper, 0 volume conformations. FMG matches the enantiomer distribution.
  • Figure 4: Cumulative distribution function plot of generated and training data for MiDi, EDM, and FMG for the GEOM-Drugs and QM9 datasets. All models capture these conformational properties very well on QM9, and reasonably well on GEOM.
  • Figure 5: Cumulative distribution function plot of generated and training data energy conformations (kcal/mol) for MiDi, EDM, and FMG for the GEOM-Drugs and QM9 datasets. On GEOM, not enough valid samples were obtained to compute the energy distribution. Both models generate higher energies than the data on GEOM-Drugs (notably, MiDi was trained on a subset of lower energy molecules on GEOM).
  • ...and 17 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Lemma 2
  • Proposition 3
  • Remark 1
  • Remark 2
  • Remark 3