Table of Contents
Fetching ...

Implicit Neural Representations of Molecular Vector-Valued Functions

Jirka Lhotka, Daniel Probst

TL;DR

The paper introduces molecular neural fields, a vector-valued implicit representation f: R^3 -> R^d for molecules that complements graph- and surface-based representations. It uses modulated periodic activations to enable high-fidelity, resolution-independent representations and demonstrates two proof-of-concept architectures: an auto-decoder for parametrization and super-resolution of a protein–ligand complex, and an auto-encoder for embedding molecular volumes in latent space. On datasets including a protein–ligand complex (e.g., PDB 6EGA) and the FreeSolv data set, it reports reconstruction and upscaling PSNRs (38.5 and 27.2) and shows latent encodings capture shape and physicochemical properties, enabling latent interpolation between conformers. The framework holds promise for learning directly from electron density maps and integrating diverse molecular information, with public code available for broader use.

Abstract

Molecules have various computational representations, including numerical descriptors, strings, graphs, point clouds, and surfaces. Each representation method enables the application of various machine learning methodologies from linear regression to graph neural networks paired with large language models. To complement existing representations, we introduce the representation of molecules through vector-valued functions, or $n$-dimensional vector fields, that are parameterized by neural networks, which we denote molecular neural fields. Unlike surface representations, molecular neural fields capture external features and the hydrophobic core of macromolecules such as proteins. Compared to discrete graph or point representations, molecular neural fields are compact, resolution independent and inherently suited for interpolation in spatial and temporal dimensions. These properties inherited by molecular neural fields lend themselves to tasks including the generation of molecules based on their desired shape, structure, and composition, and the resolution-independent interpolation between molecular conformations in space and time. Here, we provide a framework and proofs-of-concept for molecular neural fields, namely, the parametrization and superresolution reconstruction of a protein-ligand complex using an auto-decoder architecture and the embedding of molecular volumes in latent space using an auto-encoder architecture.

Implicit Neural Representations of Molecular Vector-Valued Functions

TL;DR

The paper introduces molecular neural fields, a vector-valued implicit representation f: R^3 -> R^d for molecules that complements graph- and surface-based representations. It uses modulated periodic activations to enable high-fidelity, resolution-independent representations and demonstrates two proof-of-concept architectures: an auto-decoder for parametrization and super-resolution of a protein–ligand complex, and an auto-encoder for embedding molecular volumes in latent space. On datasets including a protein–ligand complex (e.g., PDB 6EGA) and the FreeSolv data set, it reports reconstruction and upscaling PSNRs (38.5 and 27.2) and shows latent encodings capture shape and physicochemical properties, enabling latent interpolation between conformers. The framework holds promise for learning directly from electron density maps and integrating diverse molecular information, with public code available for broader use.

Abstract

Molecules have various computational representations, including numerical descriptors, strings, graphs, point clouds, and surfaces. Each representation method enables the application of various machine learning methodologies from linear regression to graph neural networks paired with large language models. To complement existing representations, we introduce the representation of molecules through vector-valued functions, or -dimensional vector fields, that are parameterized by neural networks, which we denote molecular neural fields. Unlike surface representations, molecular neural fields capture external features and the hydrophobic core of macromolecules such as proteins. Compared to discrete graph or point representations, molecular neural fields are compact, resolution independent and inherently suited for interpolation in spatial and temporal dimensions. These properties inherited by molecular neural fields lend themselves to tasks including the generation of molecules based on their desired shape, structure, and composition, and the resolution-independent interpolation between molecular conformations in space and time. Here, we provide a framework and proofs-of-concept for molecular neural fields, namely, the parametrization and superresolution reconstruction of a protein-ligand complex using an auto-decoder architecture and the embedding of molecular volumes in latent space using an auto-encoder architecture.

Paper Structure

This paper contains 6 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Upper row: Workflow and architecture overview of the presented methodology. For a given input molecule, a vector-valued function is computed and sampled on a 3D grid. Based on these samples, a neural network is trained that parametrizes the molecule using a modulated synthesis network with sine activations introduced by mehtaModulatedPeriodicActivations2021. Lower row: Based on the above workflow and architecture, a protein-ligand complex is encoded and parameterized by a modulated neural network in an auto-decoder setting. The two rightmost images show the reconstruction of the ligand and protein using the learned latent.
  • Figure 2: Super-resolution decoding of protein-ligand complex (separated channels). The top row (a, b) shows a density plot with a resolution $32\times32\times32$. The middle row (c, d) shows the up-scaled version with a resolution of $128\times128\times128$. The bottom row (e, f) shows the ground truth at a resolution of $128\times128\times128$. The peak signal-to-noise ratio of the up-scaled version and the ground truth is 75.7.
  • Figure 3: Embedding of molecular neural field latents of the FreeSolv data set. The scatter plots are colored by the Wildman--Crippen molar revlectivity value (MolMR), the exact molecular weight (ExactMolWt), the topological polar surface area (TPSA), the Wildman--Crippen LogP value (MolLogP), the number of rotatable bonds (NumRotatableBonds), and the hydration free energy. The hydration free energy was provided with the FreeSolv data set, all other values were calculated using RDKit. On visual inspection, the plots show that the latent space is meaningful in terms of molecular structure and shape.
  • Figure 4: Smoothly interpolating between two molecules parametrized by the auto-encoder trained on the FreeSolv data set.