Table of Contents
Fetching ...

VecMol: Vector-Field Representations for 3D Molecule Generation

Yuchen Hua, Xingang Peng, Jianzhu Ma, Muhan Zhang

Abstract

Generative modeling of three-dimensional (3D) molecules is a fundamental yet challenging problem in drug discovery and materials science. Existing approaches typically represent molecules as 3D graphs and co-generate discrete atom types with continuous atomic coordinates, leading to intrinsic learning difficulties such as heterogeneous modality entanglement and geometry-chemistry coherence constraints. We propose VecMol, a paradigm-shifting framework that reimagines molecular representation by modeling 3D molecules as continuous vector fields over Euclidean space, where vectors point toward nearby atoms and implicitly encode molecular structure. The vector field is parameterized by a neural field and generated using a latent diffusion model, avoiding explicit graph generation and decoupling structure learning from discrete atom instantiation. Experiments on the QM9 and GEOM-Drugs benchmarks validate the feasibility of this novel approach, suggesting vector-field-based representations as a promising new direction for 3D molecular generation.

VecMol: Vector-Field Representations for 3D Molecule Generation

Abstract

Generative modeling of three-dimensional (3D) molecules is a fundamental yet challenging problem in drug discovery and materials science. Existing approaches typically represent molecules as 3D graphs and co-generate discrete atom types with continuous atomic coordinates, leading to intrinsic learning difficulties such as heterogeneous modality entanglement and geometry-chemistry coherence constraints. We propose VecMol, a paradigm-shifting framework that reimagines molecular representation by modeling 3D molecules as continuous vector fields over Euclidean space, where vectors point toward nearby atoms and implicitly encode molecular structure. The vector field is parameterized by a neural field and generated using a latent diffusion model, avoiding explicit graph generation and decoupling structure learning from discrete atom instantiation. Experiments on the QM9 and GEOM-Drugs benchmarks validate the feasibility of this novel approach, suggesting vector-field-based representations as a promising new direction for 3D molecular generation.
Paper Structure (55 sections, 34 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 55 sections, 34 equations, 12 figures, 5 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of the proposed neural field framework for 3D molecular modeling. The figure illustrates two tightly coupled pipelines that share the same neural field decoder and reconstruction module. Top: Latent neural field encoding and reconstruction. A 3D molecule, represented by its atomic coordinates and types, is first encoded by a neural field encoder $E_\phi$ into a grid-based latent field $\mathbf{z} \in \mathbb{R}^{L^3 \times d}$, where each spatial location stores a local latent code. Given a set of spatial query points $Q = \{\mathbf{q}_i\}_{i=1}^m \in \mathbb{R}^{m\times 3}$, a neural field decoder $D_\psi$ maps the latent field to a continuous molecular vector field $\mathbf{V} = D_\psi(Q, \mathbf{z})$. Bottom: Latent field diffusion and molecular generation. A denoising diffusion probabilistic model is trained in the latent field space. Starting from Gaussian noise $\mathbf{z}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$, the diffusion model progressively denoises the latent variables to obtain a sampled latent field $\mathbf{z}_0$. Th neural field decoder and the reconstruction module then converts the sampled vector field into a discrete molecular structure through iterative gradient-based ascent and merging operations (see Section \ref{['sec:reconstruction']}).
  • Figure 2: Element-specific gradient magnitude on a planar cross-section of a representative molecule. High-gradient regions surround atomic nuclei, with element-dependent spatial extent and intensity.
  • Figure 3: Atomic coordinate reconstruction from neural vector fields. Left: reconstruction from raw neural field codes; right: reconstruction after latent denoising.
  • Figure 4: Comparison of molecular geometry. From left to right: ground truth, reconstruction from raw codes, and reconstruction from denoised codes.
  • Figure 5: Cumulative distribution of field RMSD versus ground truth for NF and Diff on 1,000 molecules from QM9 and GEOM-drugs. Both fields closely match the ground truth.
  • ...and 7 more figures