Towards Evolutionary-based Automated Machine Learning for Small Molecule Pharmacokinetic Prediction
Alex G. C. de Sá, David B. Ascher
TL;DR
The paper addresses the lack of personalized AutoML for small-molecule PK prediction by proposing a grammar-based genetic programming framework that automatically constructs PK pipelines from a rich search space of molecular representations, preprocessing steps, and ML models. It demonstrates that AutoML can select diverse, high-performing pipelines and achieve predictive performance comparable to or better than established baselines like pkCSM and XGBoost, as measured by MCC across multiple PK datasets. Key contributions include a CFG-defined search space, a GP-driven optimization method with cross-validation-based fitness, and extensive analysis of pipeline components and their impact on performance. The work offers a scalable, automated tool for accelerating small-molecule PK discovery with potential for customization to diverse PK tasks and datasets.
Abstract
Machine learning (ML) is revolutionising drug discovery by expediting the prediction of small molecule properties essential for developing new drugs. These properties -- including absorption, distribution, metabolism and excretion (ADME)-- are crucial in the early stages of drug development since they provide an understanding of the course of the drug in the organism, i.e., the drug's pharmacokinetics. However, existing methods lack personalisation and rely on manually crafted ML algorithms or pipelines, which can introduce inefficiencies and biases into the process. To address these challenges, we propose a novel evolutionary-based automated ML method (AutoML) specifically designed for predicting small molecule properties, with a particular focus on pharmacokinetics. Leveraging the advantages of grammar-based genetic programming, our AutoML method streamlines the process by automatically selecting algorithms and designing predictive pipelines tailored to the particular characteristics of input molecular data. Results demonstrate AutoML's effectiveness in selecting diverse ML algorithms, resulting in comparable or even improved predictive performances compared to conventional approaches. By offering personalised ML-driven pipelines, our method promises to enhance small molecule research in drug discovery, providing researchers with a valuable tool for accelerating the development of novel therapeutic drugs.
