Table of Contents
Fetching ...

G2DR: A Genotype-First Framework for Genetics-Informed Target Prioritization and Drug Repurposing

Muhammad Muneeb, David B. Ascher

Abstract

Human genetics offers a promising route to therapeutic discovery, yet practical frameworks translating genotype-derived signal into ranked target and drug hypotheses remain limited, particularly when matched disease transcriptomics are unavailable. Here we present G2DR, a genotype-first prioritization framework propagating inherited variation through genetically predicted expression, multi-method gene-level testing, pathway enrichment, network context, druggability, and multi-source drug--target evidence integration. In a migraine case study with 733 UK Biobank participants under stratified five-fold cross-validation, we imputed expression across seven transcriptome-weight resources and ranked genes using a reproducibility-aware discovery score from training and validation data, followed by a balanced integrated score for target selection. Discovery-based prioritization generalized to held-out data, achieving gene-level ROC-AUC of 0.775 and PR-AUC of 0.475, while retaining enrichment for curated migraine biology. Mapping prioritized genes to compounds via Open Targets, DGIdb, and ChEMBL yielded drug sets enriched for migraine-linked compounds relative to a global background, though recovery favoured broader mechanism-linked and off-label space over migraine-specific approved therapies. Directionality filtering separated broadly recovered compounds from mechanistically compatible candidates. G2DR is a modular framework for genetics-informed hypothesis generation, not a clinically actionable recommendation system. All outputs require independent experimental, pharmacological, and clinical validation.

G2DR: A Genotype-First Framework for Genetics-Informed Target Prioritization and Drug Repurposing

Abstract

Human genetics offers a promising route to therapeutic discovery, yet practical frameworks translating genotype-derived signal into ranked target and drug hypotheses remain limited, particularly when matched disease transcriptomics are unavailable. Here we present G2DR, a genotype-first prioritization framework propagating inherited variation through genetically predicted expression, multi-method gene-level testing, pathway enrichment, network context, druggability, and multi-source drug--target evidence integration. In a migraine case study with 733 UK Biobank participants under stratified five-fold cross-validation, we imputed expression across seven transcriptome-weight resources and ranked genes using a reproducibility-aware discovery score from training and validation data, followed by a balanced integrated score for target selection. Discovery-based prioritization generalized to held-out data, achieving gene-level ROC-AUC of 0.775 and PR-AUC of 0.475, while retaining enrichment for curated migraine biology. Mapping prioritized genes to compounds via Open Targets, DGIdb, and ChEMBL yielded drug sets enriched for migraine-linked compounds relative to a global background, though recovery favoured broader mechanism-linked and off-label space over migraine-specific approved therapies. Directionality filtering separated broadly recovered compounds from mechanistically compatible candidates. G2DR is a modular framework for genetics-informed hypothesis generation, not a clinically actionable recommendation system. All outputs require independent experimental, pharmacological, and clinical validation.
Paper Structure (28 sections, 2 figures, 8 tables)

This paper contains 28 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Overview of the G2DR framework. Genotype and phenotype data from the migraine cohort were partitioned by stratified cross-validation and propagated through genotype-based transcriptome imputation across multiple expression-weight resources. Covariate-adjusted predicted expression values were tested using multiple differential-expression and association methods, and significant results from the training and validation splits were aggregated into a discovery set. Genes were then ranked using a composite score that integrated reproducibility, effect magnitude, and statistical confidence, followed by pathway, network, and druggability annotation to generate an integrated target-prioritization score. Top-ranked genes were mapped to candidate compounds through Open Targets, DGIdb, and ChEMBL, and the resulting drug lists were evaluated against curated migraine-associated drug references.
  • Figure 2: Cross-database concordance of predicted gene expression across expression-weight databases.(A) Hierarchical clustering of databases using distance $(1-r)$ derived from the pairwise correlation matrix. (B) Pairwise correlation matrix (Pearson $r$).