MARS: A neurosymbolic approach for interpretable drug discovery
Lauren Nicole DeLong, Yojana Gadiya, Paola Galdi, Jacques D. Fleuriot, Daniel Domingo-Fernández
TL;DR
This work defines MoA deconvolution as an interpretable task on knowledge graphs and introduces MoA-net, a domain-specific KG for drug–biological process associations. It presents MARS, a neurosymbolic retrieval system that learns weights for metapath-based rules while performing RL-guided walks to predict MoA links, thereby offering interpretable MoA paths and rule importance. A key finding is that degree bias can drive predictions in NeSy KG methods, revealing reasoning shortcuts; the authors propose and implement a shortcut-aware mitigation strategy that improves calibration and preserves predictive performance, especially on a trimmed KG MoA-net-10k. External validation shows MARS_{P_{2H}} achieving competitive, interpretable performance near state-of-the-art levels, with better calibration than baselines and recovery of known MoAs, underscoring its potential for practical, interpretable drug discovery workflows.
Abstract
Neurosymbolic (NeSy) artificial intelligence describes the combination of logic or rule-based techniques with neural networks. Compared to neural approaches, NeSy methods often possess enhanced interpretability, which is particularly promising for biomedical applications like drug discovery. However, since interpretability is broadly defined, there are no clear guidelines for assessing the biological plausibility of model interpretations. To assess interpretability in the context of drug discovery, we devise a novel prediction task, called drug mechanism-of-action (MoA) deconvolution, with an associated, tailored knowledge graph (KG), MoA-net. We then develop the MoA Retrieval System (MARS), a NeSy approach for drug discovery which leverages logical rules with learned rule weights. Using this interpretable feature alongside domain knowledge, we find that MARS and other NeSy approaches on KGs are susceptible to reasoning shortcuts, in which the prediction of true labels is driven by "degree-bias" rather than the domain-based rules. Subsequently, we demonstrate ways to identify and mitigate this. Thereafter, MARS achieves performance on par with current state-of-the-art models while producing model interpretations aligned with known MoAs.
