GAN-TAT: A Novel Framework Using Protein Interaction Networks in Druggable Gene Identification
George Yuanji Wang, Srisharan Murugesan, Aditya Prince Rohatgi
TL;DR
Identifying druggable genes from Protein Interaction Networks is challenging due to high dimensionality and sparsity when relying on indirect topological features. The authors introduce GAN-TAT, which uses the ImGAGN-GraphSAGE embedding to directly encode PIN information, concatenated with an extended feature vector, and applies fold-based XGBoost classification to address class imbalance. GAN-TAT achieves state-of-the-art AUC-ROC on three Pharos datasets (e.g., around $0.951$ on Tclin) and shows that top predictions are enriched for clinically validated targets and GO:BP pathways, supporting clinical relevance. This PIN-centric embedding approach offers a promising direction for pharmacogenomics-target discovery, with open-source code enabling reproducibility and further refinement.
Abstract
Identifying druggable genes is essential for developing effective pharmaceuticals. With the availability of extensive, high-quality data, computational methods have become a significant asset. Protein Interaction Network (PIN) is valuable but challenging to implement due to its high dimensionality and sparsity. Previous methods relied on indirect integration, leading to resolution loss. This study proposes GAN-TAT, a framework utilizing an advanced graph embedding technology, ImGAGN, to directly integrate PIN for druggable gene inference work. Tested on three Pharos datasets, GAN-TAT achieved the highest AUC-ROC score of 0.951 on Tclin. Further evaluation shows that GAN-TAT's predictions are supported by clinical evidence, highlighting its potential practical applications in pharmacogenomics. This research represents a methodological attempt with the direct utilization of PIN, expanding potential new solutions for developing drug targets. The source code of GAN-TAT is available at (https://github.com/george-yuanji-wang/GAN-TAT).
