Table of Contents
Fetching ...

Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra

Stefan Kuhn, Vandana Dwarka, Przemyslaw Karol Grenda, Eero Vainikko

TL;DR

This work addresses the bidirectional inference between molecular structures and $^{13}$C NMR spectra, a problem with inherently ambiguous inverse mappings. It introduces a conditional invertible neural network built from i-RevNet blocks, dividing the output into a 128-bit spectrum code $Y_{latent}$ and 896-bit residual $Z_{free}$ to preserve information and encode uncertainty. The model demonstrates forward spectrum prediction with a tunable spectrum latent, and exact invertibility on training data, while inverted results on unseen spectra reveal meaningful, coarse structural signals. Together, this provides a principled end-to-end framework for spectrum prediction and uncertainty-aware candidate structure generation. The accompanying code is publicly available, enabling replication and extension in chemoinformatics applications.

Abstract

We introduce a reversible deep learning model for 13C NMR that uses a single conditional invertible neural network for both directions between molecular structures and spectra. The network is built from i-RevNet style bijective blocks, so the forward map and its inverse are available by construction. We train the model to predict a 128-bit binned spectrum code from a graph-based structure encoding, while the remaining latent dimensions capture residual variability. At inference time, we invert the same trained network to generate structure candidates from a spectrum code, which explicitly represents the one-to-many nature of spectrum-to-structure inference. On a filtered subset, the model is numerically invertible on trained examples, achieves spectrum-code prediction above chance, and produces coarse but meaningful structural signals when inverted on validation spectra. These results demonstrate that invertible architectures can unify spectrum prediction and uncertainty-aware candidate generation within one end-to-end model.

Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra

TL;DR

This work addresses the bidirectional inference between molecular structures and C NMR spectra, a problem with inherently ambiguous inverse mappings. It introduces a conditional invertible neural network built from i-RevNet blocks, dividing the output into a 128-bit spectrum code and 896-bit residual to preserve information and encode uncertainty. The model demonstrates forward spectrum prediction with a tunable spectrum latent, and exact invertibility on training data, while inverted results on unseen spectra reveal meaningful, coarse structural signals. Together, this provides a principled end-to-end framework for spectrum prediction and uncertainty-aware candidate structure generation. The accompanying code is publicly available, enabling replication and extension in chemoinformatics applications.

Abstract

We introduce a reversible deep learning model for 13C NMR that uses a single conditional invertible neural network for both directions between molecular structures and spectra. The network is built from i-RevNet style bijective blocks, so the forward map and its inverse are available by construction. We train the model to predict a 128-bit binned spectrum code from a graph-based structure encoding, while the remaining latent dimensions capture residual variability. At inference time, we invert the same trained network to generate structure candidates from a spectrum code, which explicitly represents the one-to-many nature of spectrum-to-structure inference. On a filtered subset, the model is numerically invertible on trained examples, achieves spectrum-code prediction above chance, and produces coarse but meaningful structural signals when inverted on validation spectra. These results demonstrate that invertible architectures can unify spectrum prediction and uncertainty-aware candidate generation within one end-to-end model.
Paper Structure (13 sections, 5 equations, 4 figures, 4 tables)

This paper contains 13 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Two ways to represent the structure of (-)-Menthol.
  • Figure 2: A $^{13}C$ spectrum of (-)-Menthol.
  • Figure 3: F1 and loss values during training.
  • Figure 4: The network consists of a sequence of invertible iRevNet blocks, which progressively transform the four input matrices into a 1D vector. The final latent space is conceptually partitioned into 128 bits representing a spectrum ($Y_{latent}$) and 896 unconstrained bits ($Z_{free}$). Black dots indicate omitted layers or cells, shown schematically for visual clarity. The black boxes indicate 1 values, where in the first layer a benzene molecule, with three single and three double bonds formint a ring, and all bonds marked as aromatic. 1s in further layers are put in randomly for illustrative purposes.