Table of Contents
Fetching ...

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

Zhangfan Yang, Baoyun Chen, Dong Xu, Jia Wang, Ruibin Bai, Junkai Ji, Zexuan Zhu

TL;DR

BioLM-Score provides a principled and practical alternative to existing scoring functions, combining efficiency, generalization, and interpretability for structure-based drug discovery.

Abstract

Protein-ligand scoring is a central component of structure-based drug design, underpinning molecular docking, virtual screening, and pose optimization. Conventional physics-based energy functions are often computationally expensive, limiting their utility in large-scale screening. In contrast, deep learning-based scoring models offer improved computational efficiency but frequently suffer from limited cross-target generalization and poor interpretability, which restrict their practical applicability. Here we present BioLM-Score, a simple yet generalizable protein-ligand scoring model that couples geometric modeling with representation learning. Specifically, it employs modality-specific and structure-aware encoders for proteins and ligands, each augmented with biomolecular language models to enrich structural and chemical representations. Subsequently, these representations are integrated through a mixture density network to predict multimodal interatomic distance distributions, from which statistically grounded likelihood-based scores are derived. Evaluations on the CASF-2016 benchmark demonstrate that BioLM-Score achieves significant improvements across docking, scoring, ranking, and screening tasks. Moreover, the proposed scoring function serves as an effective optimization objective for guiding docking protocols and conformational search. In summary, BioLM-Score provides a principled and practical alternative to existing scoring functions, combining efficiency, generalization, and interpretability for structure-based drug discovery.

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

TL;DR

BioLM-Score provides a principled and practical alternative to existing scoring functions, combining efficiency, generalization, and interpretability for structure-based drug discovery.

Abstract

Protein-ligand scoring is a central component of structure-based drug design, underpinning molecular docking, virtual screening, and pose optimization. Conventional physics-based energy functions are often computationally expensive, limiting their utility in large-scale screening. In contrast, deep learning-based scoring models offer improved computational efficiency but frequently suffer from limited cross-target generalization and poor interpretability, which restrict their practical applicability. Here we present BioLM-Score, a simple yet generalizable protein-ligand scoring model that couples geometric modeling with representation learning. Specifically, it employs modality-specific and structure-aware encoders for proteins and ligands, each augmented with biomolecular language models to enrich structural and chemical representations. Subsequently, these representations are integrated through a mixture density network to predict multimodal interatomic distance distributions, from which statistically grounded likelihood-based scores are derived. Evaluations on the CASF-2016 benchmark demonstrate that BioLM-Score achieves significant improvements across docking, scoring, ranking, and screening tasks. Moreover, the proposed scoring function serves as an effective optimization objective for guiding docking protocols and conformational search. In summary, BioLM-Score provides a principled and practical alternative to existing scoring functions, combining efficiency, generalization, and interpretability for structure-based drug discovery.
Paper Structure (36 sections, 7 equations, 2 figures, 4 tables)

This paper contains 36 sections, 7 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The overall architecture of BioLM-Score. The model processes protein and ligand inputs through dual encoders: a Structure Encoder (GatedGCN/GT) and a Language Encoder (ESM-C for proteins, Chemformer for ligands) The extracted features undergo multimodal fusion to generate residue and atom features. Finally, a Pairwise Distance Mixture Density Network (MDN) predicts the interaction density to compute the final BioLM-Score.
  • Figure 2: Comparison of Docking Success Rates on CASF-2016. Success rate is defined as the percentage of top-1 ranked poses with an RMSD $< 2.0$ Å relative to the crystal structure. Baselines include widely adopted commercial and academic software.