Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches
Xuefeng Liu, Songhao Jiang, Xiaotian Duan, Archit Vasan, Qinan Huang, Chong Liu, Michelle M. Li, Heng Ma, Thomas Brettin, Arvind Ramanathan, Fangfang Xia, Mengdi Wang, Abhishek Pandey, Marinka Zitnik, Ian T. Foster, Jinbo Xu, Rick L. Stevens
TL;DR
This review surveys the protein–ligand binding affinity prediction landscape, tracing the evolution from traditional physics-based scoring to machine learning and deep learning approaches, datasets, and evaluation benchmarks. It highlights two ML paradigms—interaction-free and interaction-based—along with various representations (sequences, graphs, voxels, and multimodal embeddings) and pretrained protein language models, and discusses how these methods interact with established datasets such as PDBbind, CASF, and BindingDB. The paper critically analyzes evaluation frameworks across scoring, docking, ranking, and screening, emphasizing issues of data leakage, bias, and the need for robust benchmarks and dynamic data. It concludes with future directions including dataset quality and balance, hybrid physics–ML models, explicit handling of protein flexibility, richer evaluation metrics, and the emergence of AI-driven in silico frameworks (AIVCs) for system-level, personalized predictions in precision pharmacology.
Abstract
Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to a target protein. Binding affinity, which characterizes the strength of biomolecular interactions, is essential for tackling diverse challenges in life sciences, including therapeutic design, protein engineering, enzyme optimization, and elucidating biological mechanisms. Much work has been devoted to predicting binding affinity over the past decades. Here, we review recent significant works, with a focus on methods, evaluation strategies, and benchmark datasets. We note growing use of both traditional machine learning and deep learning models for predicting binding affinity, accompanied by an increasing amount of data on proteins and small drug-like molecules. With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction; reciprocally, progress in building binding affinity predictors can refine AIVCs. Future efforts in binding affinity prediction and AI-driven in silico models can enhance the simulation of temporal dynamics, cell-type specificity, and multi-omics integration to support more accurate and personalized outcomes.
