Table of Contents
Fetching ...

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches

Xuefeng Liu, Songhao Jiang, Xiaotian Duan, Archit Vasan, Qinan Huang, Chong Liu, Michelle M. Li, Heng Ma, Thomas Brettin, Arvind Ramanathan, Fangfang Xia, Mengdi Wang, Abhishek Pandey, Marinka Zitnik, Ian T. Foster, Jinbo Xu, Rick L. Stevens

TL;DR

This review surveys the protein–ligand binding affinity prediction landscape, tracing the evolution from traditional physics-based scoring to machine learning and deep learning approaches, datasets, and evaluation benchmarks. It highlights two ML paradigms—interaction-free and interaction-based—along with various representations (sequences, graphs, voxels, and multimodal embeddings) and pretrained protein language models, and discusses how these methods interact with established datasets such as PDBbind, CASF, and BindingDB. The paper critically analyzes evaluation frameworks across scoring, docking, ranking, and screening, emphasizing issues of data leakage, bias, and the need for robust benchmarks and dynamic data. It concludes with future directions including dataset quality and balance, hybrid physics–ML models, explicit handling of protein flexibility, richer evaluation metrics, and the emergence of AI-driven in silico frameworks (AIVCs) for system-level, personalized predictions in precision pharmacology.

Abstract

Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to a target protein. Binding affinity, which characterizes the strength of biomolecular interactions, is essential for tackling diverse challenges in life sciences, including therapeutic design, protein engineering, enzyme optimization, and elucidating biological mechanisms. Much work has been devoted to predicting binding affinity over the past decades. Here, we review recent significant works, with a focus on methods, evaluation strategies, and benchmark datasets. We note growing use of both traditional machine learning and deep learning models for predicting binding affinity, accompanied by an increasing amount of data on proteins and small drug-like molecules. With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction; reciprocally, progress in building binding affinity predictors can refine AIVCs. Future efforts in binding affinity prediction and AI-driven in silico models can enhance the simulation of temporal dynamics, cell-type specificity, and multi-omics integration to support more accurate and personalized outcomes.

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches

TL;DR

This review surveys the protein–ligand binding affinity prediction landscape, tracing the evolution from traditional physics-based scoring to machine learning and deep learning approaches, datasets, and evaluation benchmarks. It highlights two ML paradigms—interaction-free and interaction-based—along with various representations (sequences, graphs, voxels, and multimodal embeddings) and pretrained protein language models, and discusses how these methods interact with established datasets such as PDBbind, CASF, and BindingDB. The paper critically analyzes evaluation frameworks across scoring, docking, ranking, and screening, emphasizing issues of data leakage, bias, and the need for robust benchmarks and dynamic data. It concludes with future directions including dataset quality and balance, hybrid physics–ML models, explicit handling of protein flexibility, richer evaluation metrics, and the emergence of AI-driven in silico frameworks (AIVCs) for system-level, personalized predictions in precision pharmacology.

Abstract

Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to a target protein. Binding affinity, which characterizes the strength of biomolecular interactions, is essential for tackling diverse challenges in life sciences, including therapeutic design, protein engineering, enzyme optimization, and elucidating biological mechanisms. Much work has been devoted to predicting binding affinity over the past decades. Here, we review recent significant works, with a focus on methods, evaluation strategies, and benchmark datasets. We note growing use of both traditional machine learning and deep learning models for predicting binding affinity, accompanied by an increasing amount of data on proteins and small drug-like molecules. With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction; reciprocally, progress in building binding affinity predictors can refine AIVCs. Future efforts in binding affinity prediction and AI-driven in silico models can enhance the simulation of temporal dynamics, cell-type specificity, and multi-omics integration to support more accurate and personalized outcomes.
Paper Structure (26 sections, 26 equations, 1 figure, 3 tables)

This paper contains 26 sections, 26 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overview of binding affinity. (A) Binding affinity pipeline. Computational binding affinity measurements are typically performed using molecular docking simulations as a surrogate. For each compound, the simulations search for an optimal binding pose and produces a score. These scores are then used to rank compounds with respect to each other.(B) Binding Example in Surface View with Protein PDB: 10GS.(C) Detailed pipline. For a given protein and a set of candidate molecules, we perform both single-site docking and multi-site docking. The top row illustrates the search for optimal molecules and corresponding binding poses across multiple potential binding pockets, while the bottom row depicts the docking of various molecules into a single predefined pocket. By combining these two strategies, we aim to identify the most favorable molecule, binding pose, and docking score. (D) Application domains. Binding affinity is crucial across drug discovery, biologics design, diagnostics, and precision medicine. It guides the identification and optimization of molecules—such as small drugs, antibodies, or probes—for strong and selective target binding.