MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

Shikun Feng; Jiaxin Zheng; Yinjun Jia; Yanwen Huang; Fengfeng Zhou; Wei-Ying Ma; Yanyan Lan

MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

Shikun Feng, Jiaxin Zheng, Yinjun Jia, Yanwen Huang, Fengfeng Zhou, Wei-Ying Ma, Yanyan Lan

TL;DR

This work constructs a large-scale and precise molecular representation dataset of approximately 140,000 small molecules, meticulously designed to capture an extensive array of chemical, physical, and biological properties, derived through a robust computational ligand-target binding analysis pipeline.

Abstract

Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address these issues, we construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules, meticulously designed to capture an extensive array of chemical, physical, and biological properties, derived through a robust computational ligand-target binding analysis pipeline. We conduct extensive experiments on various deep learning models, demonstrating that our dataset offers significant physicochemical interpretability to guide model development and design. Notably, the dataset's properties are linked to binding affinity metrics, providing additional insights into model performance in drug-target interaction tasks. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning, thereby expediting progress in the field of artificial intelligence-driven drug discovery.

MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

TL;DR

Abstract

MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)