Table of Contents
Fetching ...

Convolutional Networks on Graphs for Learning Molecular Fingerprints

David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, Ryan P. Adams

TL;DR

<3-5 sentence high-level summary> The paper addresses learning properties of molecules when inputs are graphs of varying size and topology. It introduces neural graph fingerprints, a differentiable generalization of fixed circular fingerprints, enabling end-to-end gradient-based optimization over both local neighborhood updates and global pooling. The approach achieves competitive or superior predictive performance across solubility, drug efficacy, and photovoltaic efficiency tasks, while offering interpretable activations linked to chemical substructures. This work demonstrates a scalable pathway to data-driven molecular feature learning, with potential impact on QSAR, materials design, and virtual screening.

Abstract

We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks.

Convolutional Networks on Graphs for Learning Molecular Fingerprints

TL;DR

<3-5 sentence high-level summary> The paper addresses learning properties of molecules when inputs are graphs of varying size and topology. It introduces neural graph fingerprints, a differentiable generalization of fixed circular fingerprints, enabling end-to-end gradient-based optimization over both local neighborhood updates and global pooling. The approach achieves competitive or superior predictive performance across solubility, drug efficacy, and photovoltaic efficiency tasks, while offering interpretable activations linked to chemical substructures. This work demonstrates a scalable pathway to data-driven molecular feature learning, with potential impact on QSAR, materials design, and virtual screening.

Abstract

We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks.

Paper Structure

This paper contains 30 sections, 1 equation, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Left: A visual representation of the computational graph of both standard circular fingerprints and neural graph fingerprints. First, a graph is constructed matching the topology of the molecule being fingerprinted, in which nodes represent atoms, and edges represent bonds. At each layer, information flows between neighbors in the graph. Finally, each node in the graph turns on one bit in the fixed-length fingerprint vector. Right: A more detailed sketch including the bond information used in each operation.
  • Figure 2: Circular fingerprints
  • Figure 3: Left: Comparison of pairwise distances between molecules, measured using circular fingerprints and neural graph fingerprints with large random weights. Right: Predictive performance of circular fingerprints (red), neural graph fingerprints with fixed large random weights (green) and neural graph fingerprints with fixed small random weights (blue). The performance of neural graph fingerprints with large random weights closely matches the performance of circular fingerprints.
  • Figure 4: Examining fingerprints optimized for predicting solubility. Shown here are representative examples of molecular fragments (highlighted in blue) which most activate different features of the fingerprint. Top row: The feature most predictive of solubility. Bottom row: The feature most predictive of insolubility.
  • Figure 5: Visualizing fingerprints optimized for predicting toxicity. Shown here are representative samples of molecular fragments (highlighted in red) which most activate the feature most predictive of toxicity. Top row: the most predictive feature identifies groups containing a sulphur atom attached to an aromatic ring. Bottom row: the most predictive feature identifies fused aromatic rings, also known as polycyclic aromatic hydrocarbons, a well-known carcinogen.