Table of Contents
Fetching ...

Automated Reasoning in Systems Biology: a Necessity for Precision Medicine

Pedro Zuidberg Dos Martires, Vincent Derkinderen, Luc De Raedt, Marcus Krantz

TL;DR

It is argued that the fields of knowledge representation (KR) and systems biology (SysBio) exhibit important overlaps that have been largely ignored so far, and that the formal representation of domain knowledge is a natural meeting place for SysBio and KR.

Abstract

Recent developments in AI have reinvigorated pursuits to advance the (life) sciences using AI techniques, thereby creating a renewed opportunity to bridge different fields and find synergies. Headlines for AI and the life sciences have been dominated by data-driven techniques, for instance, to solve protein folding with next to no expert knowledge. In contrast to this, we argue for the necessity of a formal representation of expert knowledge - either to develop explicit scientific theories or to compensate for the lack of data. Specifically, we argue that the fields of knowledge representation (KR) and systems biology (SysBio) exhibit important overlaps that have been largely ignored so far. This, in turn, means that relevant scientific questions are ready to be answered using the right domain knowledge (SysBio), encoded in the right way (SysBio/KR), and by combining it with modern automated reasoning tools (KR). Hence, the formal representation of domain knowledge is a natural meeting place for SysBio and KR. On the one hand, we argue that such an interdisciplinary approach will advance the field SysBio by exposing it to industrial-grade reasoning tools and thereby allowing novel scientific questions to be tackled. On the other hand, we see ample opportunities to move the state-of-the-art in KR by tailoring KR methods to the field of SysBio, which comes with challenging problem characteristics, e.g. scale, partial knowledge, noise, or sub-symbolic data. We stipulate that this proposed interdisciplinary research is necessary to attain a prominent long-term goal in the health sciences: precision medicine.

Automated Reasoning in Systems Biology: a Necessity for Precision Medicine

TL;DR

It is argued that the fields of knowledge representation (KR) and systems biology (SysBio) exhibit important overlaps that have been largely ignored so far, and that the formal representation of domain knowledge is a natural meeting place for SysBio and KR.

Abstract

Recent developments in AI have reinvigorated pursuits to advance the (life) sciences using AI techniques, thereby creating a renewed opportunity to bridge different fields and find synergies. Headlines for AI and the life sciences have been dominated by data-driven techniques, for instance, to solve protein folding with next to no expert knowledge. In contrast to this, we argue for the necessity of a formal representation of expert knowledge - either to develop explicit scientific theories or to compensate for the lack of data. Specifically, we argue that the fields of knowledge representation (KR) and systems biology (SysBio) exhibit important overlaps that have been largely ignored so far. This, in turn, means that relevant scientific questions are ready to be answered using the right domain knowledge (SysBio), encoded in the right way (SysBio/KR), and by combining it with modern automated reasoning tools (KR). Hence, the formal representation of domain knowledge is a natural meeting place for SysBio and KR. On the one hand, we argue that such an interdisciplinary approach will advance the field SysBio by exposing it to industrial-grade reasoning tools and thereby allowing novel scientific questions to be tackled. On the other hand, we see ample opportunities to move the state-of-the-art in KR by tailoring KR methods to the field of SysBio, which comes with challenging problem characteristics, e.g. scale, partial knowledge, noise, or sub-symbolic data. We stipulate that this proposed interdisciplinary research is necessary to attain a prominent long-term goal in the health sciences: precision medicine.

Paper Structure

This paper contains 6 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: Illustration of a site-specific state change: a phosphor from a molecule (ATP) is attached to a specific position (also called site) of a protein (also called component). The depicted component has two sites indicated by the blue color -- $S_0$ (bottom left) and $S_1$ (top right).
  • Figure 2: The NLRP3 inflammasome signalling pathway constitutes a tiny part of the human STN and is a key regulator of inflammation processes in human cells. We depict in mechanistic detail using a reaction-contingency-based formalism (cf. Section \ref{['sec:sota']}) the assembly and activation of the inflammasome. The figure is reproduced from krantz2023detailed.
  • Figure 3: Bipartite Boolean simulation of the NLRP3 inflammasome pathway from Figure \ref{['fig:inflamma']}. Each row represent the value of a Boolean model variable, e.g. modifications at component sites and reactions, over time, where black corresponds to true and white to false. The three grey columns indicate interventions in the system where at specific time points a subset of the state variables were set by hand. Each intervention simulates exposure of the system to different signals that together (but not individually) trigger inflammasome activation. The simulation shows that only after the third intervention (third greyed-in column) the initial conditions are set such that the pathway is active (indicated by the state variables in the blue boxes turning true). For further details we refer the reader to krantz2023detailed, from where we reproduced the figure.
  • Figure 4: In the middle at the top (Panel 1) we have a fragment of a reaction-contingency network with two reactions (red nodes) and one site-specific state (blue node). Using the bipartite Boolean network language Romers2020 we can express these as temporal rules in propositional logic (Panel 2). These rules can then be used in a straightforward fashion to simulate the system over time by providing initial conditions and recursively computing the left-hand side of the equivalences. We can see such a simulation in Panel 4, where we give the trajectories not only for the fragment in Panel 1 but for the the entire set of variables involved in the pathway. Alternatively, we can use probabilistic logics (Panel 3). Simulating the pathway using these probabilistic transition rules now leads to simulations of the pathway where site and reaction variables are not deterministically true or false anymore. This is indicated by the greyed-in cells in Panel 5. Using these probabilities one can now perform a quantitative analysis of the system over time -- a feature that is not possible using non-probabilistic representations.
  • Figure 5: Schematic overview of our vision for achieving precision medicine: bio-medical background knowledge provides the structure for modelling signal transduction networks. This structure is then parametrized using generic (population level) data. Once this has been achieved, a generic model is refined on patient specific data. For example, via performing conditional probabilistic inference or fine-tuning parameters via gradient-based learning on patient specific data.