Table of Contents
Fetching ...

Thermodynamics-Informed Accurate pKa Prediction and Protonation State Generation in PlayMolecule AI

Francesco Pesce, Stephen Farr, Gianni de Fabritiis

Abstract

Accurate prediction of acid dissociation constants (p$K_{\rm a}$) and the determination of dominant protonation states is critical in drug discovery, influencing molecular properties such as solubility, permeability, and protein-ligand binding. We present Acep$K_{\rm a}$, an advanced application integrated into the PlayMolecule AI platform. Acep$K_{\rm a}$ is built upon the theoretically rigorous Uni-p$K_{\rm a}$ framework, which unifies statistical mechanics with representation learning. By modeling the complete protonation ensemble rather than treating p$K_a$ as a scalar regression target, Acep$K_{\rm a}$ ensures thermodynamic consistency across coupled ionization sites. We describe the application's enhanced architecture, which features a retrained Uni-Mol backbone achieving state-of-the-art performance on standard benchmarks. Furthermore, we detail critical engineering advancements. These include AceConfgen, a proprietary GPU-accelerated conformer generator that achieves a ~40x speed-up compared to NVIDIA's nvmolkit, a streamlined inference engine to directly protonate molecules, and a 3D-aware modality for applying protonation states to bound ligand poses. Finally, we discuss the integration of Acep$K_{\rm a}$ into the PlayMolecule AI ecosystem, a modern AI-assisted environment for molecular modelling and drug discovery.

Thermodynamics-Informed Accurate pKa Prediction and Protonation State Generation in PlayMolecule AI

Abstract

Accurate prediction of acid dissociation constants (p) and the determination of dominant protonation states is critical in drug discovery, influencing molecular properties such as solubility, permeability, and protein-ligand binding. We present Acep, an advanced application integrated into the PlayMolecule AI platform. Acep is built upon the theoretically rigorous Uni-p framework, which unifies statistical mechanics with representation learning. By modeling the complete protonation ensemble rather than treating p as a scalar regression target, Acep ensures thermodynamic consistency across coupled ionization sites. We describe the application's enhanced architecture, which features a retrained Uni-Mol backbone achieving state-of-the-art performance on standard benchmarks. Furthermore, we detail critical engineering advancements. These include AceConfgen, a proprietary GPU-accelerated conformer generator that achieves a ~40x speed-up compared to NVIDIA's nvmolkit, a streamlined inference engine to directly protonate molecules, and a 3D-aware modality for applying protonation states to bound ligand poses. Finally, we discuss the integration of Acep into the PlayMolecule AI ecosystem, a modern AI-assisted environment for molecular modelling and drug discovery.

Paper Structure

This paper contains 10 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Benchmark of models on public p$K_{\rm a}$ datasets. All data except for AcepKa are taken from luo2024bridging and wagen2025physics.
  • Figure 2: Acep$K_{\rm a}$ app workflow. (1) App takes in input one or multiple SMILEs strings (or 3D conformers) and a pH value. (2) Then, it generates the protonation ensemble for each of the provided SMILES. (3) A conformational ensemble for each of the microstates is generated using nvmokit. (4) The 3D conformations are fed to the Uni-Mol model that predicts their free energies. (4) Free-energies are used in Eq. \ref{['eq:boltz_weights']} to derive the relative pH-dependent population of each microstate.
  • Figure 3: Output example from Acep$K_{\rm a}$ in single-molecule mode using histamine as query molecule. The figure shows the enumerated microstates on the left and their pH-dependent relative population on the right.
  • Figure 4: Comparison of AceConfgen and nvMolKit performance on the Platinum Benchmark. (a) Distributions of minimum RMSD values. (b) Total runtime required to complete the benchmark.
  • Figure 5: PlayMolecule AI GUI and Acep$K_{\rm a}$ usage orchestrated by the LLM agent, showing the prediction of the protonated state of a ligand sitting in a protein pocket. (a) The agent is asked to fetch a structure with PDB ID 3MJ2. (b) The agent is asked to predict the protonation state of the ligand present in the loaded structure at a target pH, and to visualize the results.