A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

Yossra Gharbi; Rocío Mercado

A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

Yossra Gharbi, Rocío Mercado

TL;DR

The paper surveys machine learning approaches for de novo PROTAC design, focusing first on the specialized challenges of PROTAC linker design and then on holistic PROTAC design that optimizes warhead, E3 ligase ligand, and linker. It reviews 2D and 3D generative models, reinforcement learning, and degradation-activity surrogates, highlighting key datasets like PROTAC-DB and PROTACpedia and noting the limitations imposed by data scarcity and reliance on small-molecule training. The authors underscore the critical role of 3D information and ternary-complex modeling in PROTAC design, discuss current limitations of existing ML tools when applied to this modality, and point to emerging directions such as diffusion models and transfer learning. The work provides a roadmap for future ML-driven PROTAC engineering, emphasizing tailored datasets, physics-informed modeling, and methods capable of capturing the spatial dynamics essential for effective targeted protein degradation.

Abstract

Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin-proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. As the field evolves, it becomes increasingly apparent that the traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we explore the impact of ML on de novo PROTAC design $-$ an aspect of molecular design that has not been comprehensively reviewed despite its significance. We delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for researchers in their pursuit of better design strategies for this new modality.

A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

TL;DR

Abstract

A Comprehensive Review of Emerging Approaches in Machine Learning for De Novo PROTAC Design

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)