Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

Mikhail Tsitsvero; Mingoo Jin; Andrey Lyalin

Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

Mikhail Tsitsvero, Mingoo Jin, Andrey Lyalin

TL;DR

The paper tackles uncertainty control and scalability in Gaussian processes for molecular data by employing scalable variational GPs with learnable inducing points and analyzing multiple training objectives. It demonstrates that variational inducing points can represent configurations across molecular types and that predictive log-likelihood yields superior uncertainty estimation at a slight cost to accuracy, validated on energies and atomic forces with SOAP descriptors. A large molecular crystal case shows SVGPs can match exact GP performance for force predictions while maintaining sparsity, highlighting practicality for autonomous ML pipelines in chemistry. Overall, the work provides a scalable GP framework with robust uncertainty handling for high-dimensional molecular descriptors, applicable to both small molecules and extended crystalline systems.

Abstract

Uncertainty control and scalability to large datasets are the two main issues for the deployment of Gaussian process (GP) models within the autonomous machine learning-based prediction pipelines in material science and chemistry. One way to address both of these issues is by introducing the latent inducing point variables and choosing the right approximation for the marginal log-likelihood objective. Here, we empirically show that variational learning of the inducing points in a molecular descriptor space improves the prediction of energies and atomic forces on two molecular dynamics datasets. First, we show that variational GPs can learn to represent the configurations of the molecules of different types that were not present within the initialization set of configurations. We provide a comparison of alternative log-likelihood training objectives and variational distributions. Among several evaluated approximate marginal log-likelihood objectives, we show that predictive log-likelihood provides excellent uncertainty estimates at the slight expense of predictive quality. Furthermore, we extend our study to a large molecular crystal system, showing that variational GP models perform well for predicting atomic forces by efficiently learning a sparse representation of the dataset.

Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

TL;DR

Abstract

Paper Structure (14 sections, 15 equations, 6 figures, 2 tables)

This paper contains 14 sections, 15 equations, 6 figures, 2 tables.

\newlabelsec:intro0 Introduction
Fitting molecular data with scalable Gaussian processes
Invariant Descriptors
Scalable Variational Gaussian processes
Experiments
Fitting energies of molecular isomers
Fitting atomic forces of molecular crystal
Initialization of inducing points and stochasticity of the training
Interpolation vs. extrapolation
Conclusion
Experiment details
Fitting energies of molecular isomers
Fitting forces of molecular crystal
Computational details

Figures (6)

Figure 1: Structural isomers of C3H8O.
Figure 1:
Figure 1:
Figure 2:
Figure 3: a) Machine learning scheme for modeling atomic forces in a molecular crystal (b). A collection of 14 GP models was trained. Each GP model correspond to atomic group with similar local atomic environments within a molecular crystal. (b) Molecular crystal system with dynamic gearing parts.
...and 1 more figures

Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

TL;DR

Abstract

Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

Authors

TL;DR

Abstract

Table of Contents

Figures (6)