Table of Contents
Fetching ...

Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction

Sabrina Islam, Md. Atiqur Rahman, Md. Bakhtiar Hasan, Md. Hasanul Kabir

TL;DR

The paper addresses the challenge of predicting PARP-1 inhibitors by proposing Rep3Net, a multimodal deep learning architecture that fuses molecular descriptors, graph-based spatial features, and ChemBERTa SMILES embeddings to predict pIC50 values. It details dataset curation from ChEMBL, preprocessing, and three-input representations, followed by a fusion-based regression framework that outperforms classical QSAR and standard GNN baselines while remaining computationally efficient. Through ablation studies and significance testing, the work demonstrates that combining diverse molecular views yields the strongest predictive signals, particularly in data-scarce regimes relevant to early drug discovery. The approach holds promise for accelerating virtual screening of PARP-1 inhibitors and can be extended with larger chemical spaces and explainable AI techniques.

Abstract

In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role. Trdaitional QSAR models that utilizes molecular descriptor based data often struggles to predict bioactivity of molecules effectively due to its limitation in capturing structural and contextual information embedded within each compound. To address this challenge, we propose Rep3Net, a unified deep learning architecture that not only incorporates descriptor data but also includes spatial and relational information through graph-based represenation of compounds and contextual information through ChemBERTa generated embeddings from SMILES strings. Our model employing multimodal concatenated features produce reliable bioactivity prediction on Poly [ADP-ribose] polymerase 1 (PARP-1) dataset. PARP-1 is a crucial agent in DNA damage repair and has become a significant theraputic target in malignancies that depend on it for survival and growth. A comprehensive analysis and comparison with conventional standalone models including GCN, GAT, XGBoost, etc. demonstrates that our architecture achieves the highest predictive performance. In computational screening of compounds in drug discovery, our architecture provides a scalable framework for bioactivity prediction.

Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction

TL;DR

The paper addresses the challenge of predicting PARP-1 inhibitors by proposing Rep3Net, a multimodal deep learning architecture that fuses molecular descriptors, graph-based spatial features, and ChemBERTa SMILES embeddings to predict pIC50 values. It details dataset curation from ChEMBL, preprocessing, and three-input representations, followed by a fusion-based regression framework that outperforms classical QSAR and standard GNN baselines while remaining computationally efficient. Through ablation studies and significance testing, the work demonstrates that combining diverse molecular views yields the strongest predictive signals, particularly in data-scarce regimes relevant to early drug discovery. The approach holds promise for accelerating virtual screening of PARP-1 inhibitors and can be extended with larger chemical spaces and explainable AI techniques.

Abstract

In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role. Trdaitional QSAR models that utilizes molecular descriptor based data often struggles to predict bioactivity of molecules effectively due to its limitation in capturing structural and contextual information embedded within each compound. To address this challenge, we propose Rep3Net, a unified deep learning architecture that not only incorporates descriptor data but also includes spatial and relational information through graph-based represenation of compounds and contextual information through ChemBERTa generated embeddings from SMILES strings. Our model employing multimodal concatenated features produce reliable bioactivity prediction on Poly [ADP-ribose] polymerase 1 (PARP-1) dataset. PARP-1 is a crucial agent in DNA damage repair and has become a significant theraputic target in malignancies that depend on it for survival and growth. A comprehensive analysis and comparison with conventional standalone models including GCN, GAT, XGBoost, etc. demonstrates that our architecture achieves the highest predictive performance. In computational screening of compounds in drug discovery, our architecture provides a scalable framework for bioactivity prediction.

Paper Structure

This paper contains 24 sections, 10 equations, 1 figure, 4 tables, 1 algorithm.

Figures (1)

  • Figure 1: Proposed Architecture