Generative artificial intelligence for computational chemistry: a roadmap to predicting emergent phenomena

Pratyush Tiwary; Lukas Herron; Richard John; Suemin Lee; Disha Sanwal; Ruiyu Wang

Generative artificial intelligence for computational chemistry: a roadmap to predicting emergent phenomena

Pratyush Tiwary, Lukas Herron, Richard John, Suemin Lee, Disha Sanwal, Ruiyu Wang

TL;DR

This Perspective addresses the challenge of predicting emergent chemical phenomena with Generative AI by surveying foundational concepts in both computational chemistry and AI, and by outlining a spectrum of AI methods (autoencoders, GANs, RL, flow models, and LLMs) tailored to molecular modeling. It highlights representative applications in ab initio QC, ML-based force fields, and biomolecular structure prediction (protein and RNA), while critically examining limitations such as data scarcity, training stability, and the difficulty of capturing emergent behavior. The authors argue that integrating core chemical principles, especially statistical mechanics and environmental context, is essential for turning AI into a reliable predictive tool for chemistry. They propose design principles and hybrid approaches (e.g., AF2RAVE, Thermodynamic Maps) to bridge AI with physics, aiming to predict functions and emergent phenomena from chemical identity under realistic conditions. The outlook emphasizes cautious, physics-grounded progress and the need for rigorous validation to realize AI's potential to accelerate discovery and deepen understanding of complex chemical systems.

Abstract

The recent surge in Generative Artificial Intelligence (AI) has introduced exciting possibilities for computational chemistry. Generative AI methods have made significant progress in sampling molecular structures across chemical species, developing force fields, and speeding up simulations. This Perspective offers a structured overview, beginning with the fundamental theoretical concepts in both Generative AI and computational chemistry. It then covers widely used Generative AI methods, including autoencoders, generative adversarial networks, reinforcement learning, flow models and language models, and highlights their selected applications in diverse areas including force field development, and protein/RNA structure prediction. A key focus is on the challenges these methods face before they become truly predictive, particularly in predicting emergent chemical phenomena. We believe that the ultimate goal of a simulation method or theory is to predict phenomena not seen before, and that Generative AI should be subject to these same standards before it is deemed useful for chemistry. We suggest that to overcome these challenges, future AI models need to integrate core chemical principles, especially from statistical mechanics.

Generative artificial intelligence for computational chemistry: a roadmap to predicting emergent phenomena

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 2 figures)

This paper contains 15 sections, 1 equation, 2 figures.

The Theoretical Minimum
Computational chemistry
Generative AI
Generative AI methods for computational chemistry
Autoencoders and derived methods
Generative adversarial networks (GANs)
Reinforcement learning
Flow based methods
Recurrent neural networks and large language models
Selected applications
Ab initio quantum chemistry and coarse-grained force fields
Protein structure and conformation prediction
RNA structure prediction
Desirables from Generative AI for chemistry
Critical assessment and outlook

Figures (2)

Figure 1: Overall framework of Generative AI and the methods discussed in this Perspective. The central task in Generative AI is to generate new data that is similar to the training data or to model the underlying distribution of the data when the probability distribution is not explicitly available. This challenge is particularly relevant chemistry, where data can be structured (e.g., molecular graphs) or unstructured, time series or static. The methods discussed in this Perspective include (I.) Autoencoder (AE): An architecture where the encoder compresses the input data $x$ into a latent space $z$, and the decoder reconstructs the data from $z$ to produce an output $x'$ which should closely match the input. (II.) Generative Adversarial Networks (GANs): A framework comprising a generator that produces synthetic data $x'$ from latent variables $z$ and a discriminator that distinguishes between real data $x$ and generated data $x'$. The generator and discriminator are trained together in an adversarial process, with the generator improving its ability to create realistic data as the discriminator refines its ability to detect fakes. (III.) Reinforcement Learning (RL): A learning paradigm where an agent, typically a decision-making entity, interacts with an environment over time $t$. The agent takes actions $a_t$ based on the current state $S_t$ of the environment, and in return, it receives rewards $R_t$. Through this process, the agent learns to maximize cumulative rewards by refining its strategy or policy over successive iterations. (IV.) Flow models: These models learn to transform complex probability distributions of the data $x$ into simpler, tractable prior distributions $z$ using invertible functions $f(x)$. Given data from a complex true distribution, these models enable the mapping to a simpler latent space, from which new data can be generated by inverting the transformation. (V.) Large Language Models (LLMs): A typical LLM consists of an encoder and a decoder, both composed of multiple transformer layers. These layers use self-attention mechanisms to understand and focus on the most relevant parts of the input sequence, facilitating the generation of coherent and contextually appropriate outputs in a variety of natural language processing tasks.
Figure 2: Desirables for predicting emergent phenomena in chemistry with Generative AI. The goal for Generative AI, molecular simulations, and computational chemistry is to start from the chemical identity—whether sequence or composition—and accurately predict function while considering the relevant environmental conditions. To achieve this, the intermediate rungs of structure, thermodynamic ensemble, and environment must be accounted for, where the environment can be quantified through parameters such as the Temperature $T$, pressure $P$ and chemical potential $\mu$. As we move up each level, modeling becomes more challenging due to the nuances of fluctuations governed by laws of equilibrium and non-equilibrium statistical mechanics. This will require Generative AI models deeply grounded in statistical mechanics, with precise priors to account for complex interactions, dynamics, and environmental effects.

Generative artificial intelligence for computational chemistry: a roadmap to predicting emergent phenomena

TL;DR

Abstract

Generative artificial intelligence for computational chemistry: a roadmap to predicting emergent phenomena

Authors

TL;DR

Abstract

Table of Contents

Figures (2)