Table of Contents
Fetching ...

MARS: A neurosymbolic approach for interpretable drug discovery

Lauren Nicole DeLong, Yojana Gadiya, Paola Galdi, Jacques D. Fleuriot, Daniel Domingo-Fernández

TL;DR

This work defines MoA deconvolution as an interpretable task on knowledge graphs and introduces MoA-net, a domain-specific KG for drug–biological process associations. It presents MARS, a neurosymbolic retrieval system that learns weights for metapath-based rules while performing RL-guided walks to predict MoA links, thereby offering interpretable MoA paths and rule importance. A key finding is that degree bias can drive predictions in NeSy KG methods, revealing reasoning shortcuts; the authors propose and implement a shortcut-aware mitigation strategy that improves calibration and preserves predictive performance, especially on a trimmed KG MoA-net-10k. External validation shows MARS_{P_{2H}} achieving competitive, interpretable performance near state-of-the-art levels, with better calibration than baselines and recovery of known MoAs, underscoring its potential for practical, interpretable drug discovery workflows.

Abstract

Neurosymbolic (NeSy) artificial intelligence describes the combination of logic or rule-based techniques with neural networks. Compared to neural approaches, NeSy methods often possess enhanced interpretability, which is particularly promising for biomedical applications like drug discovery. However, since interpretability is broadly defined, there are no clear guidelines for assessing the biological plausibility of model interpretations. To assess interpretability in the context of drug discovery, we devise a novel prediction task, called drug mechanism-of-action (MoA) deconvolution, with an associated, tailored knowledge graph (KG), MoA-net. We then develop the MoA Retrieval System (MARS), a NeSy approach for drug discovery which leverages logical rules with learned rule weights. Using this interpretable feature alongside domain knowledge, we find that MARS and other NeSy approaches on KGs are susceptible to reasoning shortcuts, in which the prediction of true labels is driven by "degree-bias" rather than the domain-based rules. Subsequently, we demonstrate ways to identify and mitigate this. Thereafter, MARS achieves performance on par with current state-of-the-art models while producing model interpretations aligned with known MoAs.

MARS: A neurosymbolic approach for interpretable drug discovery

TL;DR

This work defines MoA deconvolution as an interpretable task on knowledge graphs and introduces MoA-net, a domain-specific KG for drug–biological process associations. It presents MARS, a neurosymbolic retrieval system that learns weights for metapath-based rules while performing RL-guided walks to predict MoA links, thereby offering interpretable MoA paths and rule importance. A key finding is that degree bias can drive predictions in NeSy KG methods, revealing reasoning shortcuts; the authors propose and implement a shortcut-aware mitigation strategy that improves calibration and preserves predictive performance, especially on a trimmed KG MoA-net-10k. External validation shows MARS_{P_{2H}} achieving competitive, interpretable performance near state-of-the-art levels, with better calibration than baselines and recovery of known MoAs, underscoring its potential for practical, interpretable drug discovery workflows.

Abstract

Neurosymbolic (NeSy) artificial intelligence describes the combination of logic or rule-based techniques with neural networks. Compared to neural approaches, NeSy methods often possess enhanced interpretability, which is particularly promising for biomedical applications like drug discovery. However, since interpretability is broadly defined, there are no clear guidelines for assessing the biological plausibility of model interpretations. To assess interpretability in the context of drug discovery, we devise a novel prediction task, called drug mechanism-of-action (MoA) deconvolution, with an associated, tailored knowledge graph (KG), MoA-net. We then develop the MoA Retrieval System (MARS), a NeSy approach for drug discovery which leverages logical rules with learned rule weights. Using this interpretable feature alongside domain knowledge, we find that MARS and other NeSy approaches on KGs are susceptible to reasoning shortcuts, in which the prediction of true labels is driven by "degree-bias" rather than the domain-based rules. Subsequently, we demonstrate ways to identify and mitigate this. Thereafter, MARS achieves performance on par with current state-of-the-art models while producing model interpretations aligned with known MoAs.
Paper Structure (27 sections, 8 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 8 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: MoA of cortisone acetate. Cortisone acetate upregulates the activity of the glucocorticoid (GC) receptor protein, which, in turn, downregulates the cyclooxygenase (COX) protein. Since COX is directly involved in creating inflammation, its inhibition reduces inflammation, thereby treating keratitis drugmechdb. Data regarding protein interactions and biological processes (left) can be collected in a laboratory setting, whereas physiological effects like indications (right) are obtained during or after clinical trials. Created in https://BioRender.com.
  • Figure 2: Overview of the MoA retrieval system (MARS). Created in https://BioRender.com.
  • Figure 3: Hits@10 and MRR for $\text{MARS}_{P_{2H}}$ compared to PoLo and $\text{MARS}_{\text{naive}}$ upon several variants of MoA-net. Each bar is the average and standard deviation across five independent training and testing iterations. From left to right: Little change between initial metrics upon MoA-net (A) in comparison to the standard MoA-net-permuted metrics (B) provides evidence that predictions are influenced by degree bias, resulting in a reasoning shortcut. Thereafter, inverse edges were removed to prohibit the reasoning shortcut, hindering performance (C). Performance was restored upon MoA-net-10k with the KG trimming step (D), with $\text{MARS}_{P_{2H}}$ showing the best standard and pruned metrics. Finally, $\text{MARS}_{P_{2H}}$ maintains high pruned metrics even when inverse edges (and reasoning shortcuts) are re-introduced (E).
  • Figure 4: Metapath-based rule weights from $\text{MARS}_{P_{2H}}$ on MoA-net (Fig. \ref{['fig:metrics']}-A). Each bar is the average and standard error across five independent training and testing iterations. Paths involving consecutive PPIs ($\texttt{interacts}(Protein, Protein)$), the most common relation type, have consistently lower weights.
  • Figure 5: DrugMechDB paths extracted between drugs and BPs.
  • ...and 4 more figures