Table of Contents
Fetching ...

GAN-TAT: A Novel Framework Using Protein Interaction Networks in Druggable Gene Identification

George Yuanji Wang, Srisharan Murugesan, Aditya Prince Rohatgi

TL;DR

Identifying druggable genes from Protein Interaction Networks is challenging due to high dimensionality and sparsity when relying on indirect topological features. The authors introduce GAN-TAT, which uses the ImGAGN-GraphSAGE embedding to directly encode PIN information, concatenated with an extended feature vector, and applies fold-based XGBoost classification to address class imbalance. GAN-TAT achieves state-of-the-art AUC-ROC on three Pharos datasets (e.g., around $0.951$ on Tclin) and shows that top predictions are enriched for clinically validated targets and GO:BP pathways, supporting clinical relevance. This PIN-centric embedding approach offers a promising direction for pharmacogenomics-target discovery, with open-source code enabling reproducibility and further refinement.

Abstract

Identifying druggable genes is essential for developing effective pharmaceuticals. With the availability of extensive, high-quality data, computational methods have become a significant asset. Protein Interaction Network (PIN) is valuable but challenging to implement due to its high dimensionality and sparsity. Previous methods relied on indirect integration, leading to resolution loss. This study proposes GAN-TAT, a framework utilizing an advanced graph embedding technology, ImGAGN, to directly integrate PIN for druggable gene inference work. Tested on three Pharos datasets, GAN-TAT achieved the highest AUC-ROC score of 0.951 on Tclin. Further evaluation shows that GAN-TAT's predictions are supported by clinical evidence, highlighting its potential practical applications in pharmacogenomics. This research represents a methodological attempt with the direct utilization of PIN, expanding potential new solutions for developing drug targets. The source code of GAN-TAT is available at (https://github.com/george-yuanji-wang/GAN-TAT).

GAN-TAT: A Novel Framework Using Protein Interaction Networks in Druggable Gene Identification

TL;DR

Identifying druggable genes from Protein Interaction Networks is challenging due to high dimensionality and sparsity when relying on indirect topological features. The authors introduce GAN-TAT, which uses the ImGAGN-GraphSAGE embedding to directly encode PIN information, concatenated with an extended feature vector, and applies fold-based XGBoost classification to address class imbalance. GAN-TAT achieves state-of-the-art AUC-ROC on three Pharos datasets (e.g., around on Tclin) and shows that top predictions are enriched for clinically validated targets and GO:BP pathways, supporting clinical relevance. This PIN-centric embedding approach offers a promising direction for pharmacogenomics-target discovery, with open-source code enabling reproducibility and further refinement.

Abstract

Identifying druggable genes is essential for developing effective pharmaceuticals. With the availability of extensive, high-quality data, computational methods have become a significant asset. Protein Interaction Network (PIN) is valuable but challenging to implement due to its high dimensionality and sparsity. Previous methods relied on indirect integration, leading to resolution loss. This study proposes GAN-TAT, a framework utilizing an advanced graph embedding technology, ImGAGN, to directly integrate PIN for druggable gene inference work. Tested on three Pharos datasets, GAN-TAT achieved the highest AUC-ROC score of 0.951 on Tclin. Further evaluation shows that GAN-TAT's predictions are supported by clinical evidence, highlighting its potential practical applications in pharmacogenomics. This research represents a methodological attempt with the direct utilization of PIN, expanding potential new solutions for developing drug targets. The source code of GAN-TAT is available at (https://github.com/george-yuanji-wang/GAN-TAT).
Paper Structure (9 sections, 2 figures, 1 table)

This paper contains 9 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: A) Illustration of the GAN-TAT architecture. The upstream module embeds the PIN and generates an extended feature set. The downstream module partitions the dataset and trains classifiers. B) Designs of ImGAGN-GraphSAGE, with graph generator, encoder, and discriminator.
  • Figure 2: A) A bar graph representing the overlap between Tclin drug genes and top predictions of GAN-TAT. B) Comparison of enrichment analysis between top $5\%$ ranked genes and Tclin.