Table of Contents
Fetching ...

Barlow Twins Deep Neural Network for Advanced 1D Drug-Target Interaction Prediction

Maximilian G. Schuh, Davide Boldini, Annkathrin I. Bohne, Stephan A. Sieber

TL;DR

A computationally efficient and effective hybrid approach, combining the deep learning model Barlow Twins and gradient boosting machines, outperforms state-of-the-art methods across multiple splits and benchmarks using only one-dimensional input.

Abstract

Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model's ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interaction predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti .

Barlow Twins Deep Neural Network for Advanced 1D Drug-Target Interaction Prediction

TL;DR

A computationally efficient and effective hybrid approach, combining the deep learning model Barlow Twins and gradient boosting machines, outperforms state-of-the-art methods across multiple splits and benchmarks using only one-dimensional input.

Abstract

Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model's ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interaction predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti .
Paper Structure (26 sections, 1 equation, 3 figures, 4 tables)

This paper contains 26 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: BarlowDTI architecture. Drug and target serve as 1D input, where they are processed and converted into vectors. Molecules are provided as SMILES and converted to ECFP. On the other hand, the primary amino acid sequence is vectorised using a bilingual 3D structure-aware PLM. The Barlow Twins architecture learns to understand DTI. The objective function forces both representations of the DTI to be as close as possible to the unity matrix. Finally, this DL model is used as a feature-extractor and a GBM is trained on the embeddings and the interaction label. The GBM is then used as the predictor.
  • Figure 2: A comparison of the performance of methods established in the literature.a) The state-of-the-art performance of BarlowDTI in terms of PR_AUC was visualised in comparison to other models (for metrics and their statistics refer to \ref{['tab:metrics']}). b) The change in performance was examined as key elements of the BarlowDTI architecture were incrementally removed. c) The newly introduced model baseline, XGBoost, was compared with other established methods. A per dataset and split difference in PR_AUC was calculated based on BarlowDTI (in b) performance or the baseline model (in c). The overall change was investigated for statistical significance (****$p < 0.0001$, two-sided Welch's $t$-test,welch1947generalizationvirtanen2020scipy with Benjamini-Hochbergbenjamini1995controlling multiple testing correction).
  • Figure 3: Structure-based explanation of BarlowDTIXXL predictions.a) Co-crystal structures of lplA1 and LipL1 with LA as ligand are shown in superposition, together with the most influential training sample. b) The squared Pearson $R$pearson1895note correlation of BarlowDTIXXL and ITC measurements is presented.dienemann2023chemicalc) The protein residue--ligand interactions at the active site are compared. d) We identified the most influential training samples for LA predictions. The distribution of Jaccard similarity for all training samples is shown. We applied kernel density estimation to the histogram to improve visibility, due to the large training set size. e) The most influential training samples are highlighted ($\downarrow$).