Table of Contents
Fetching ...

Active learning of digenic functions with boolean matrix logic programming

Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin

TL;DR

A novel approach called Boolean Matrix Logic Programming (BMLP) is described, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning and enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.

Abstract

We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.

Active learning of digenic functions with boolean matrix logic programming

TL;DR

A novel approach called Boolean Matrix Logic Programming (BMLP) is described, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning and enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.

Abstract

We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, , which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.
Paper Structure (8 sections, 1 theorem, 6 equations, 3 figures)

This paper contains 8 sections, 1 theorem, 6 equations, 3 figures.

Key Result

theorem 1

(Active learning sample complexity bound ai_boolean_2024) For some $\phi \in [0, \frac{1}{2}]$ and a small hypothesis error $\epsilon > 0$, if an active version space learner can select instances to label from an instance space $\mathcal{X}$ with minimal reduction ratios greater than or equal to $\p where c is a constant and $s_{passive}$ is the sample complexity of learning from randomly labelled

Figures (3)

  • Figure 1: Certain genetic mutations would block pathways, causing cells to die (positive label). $BMLP_{active}$ finds a gene-reaction association hypothesis to explain the pathway blockage and lethality. It encodes the GEM iML1515 as boolean matrices and uses them to classify genetic mutation experiment labels for every hypothesis. It consults a data source to request ground truth labels. $BMLP_{active}$ iteratively refutes hypotheses inconsistent with the labels.
  • Figure 2: (a) The vector $\textbf{v}$ encodes source chemical metabolites. All reactions are represented in the boolean matrices $\textbf{R}_1$ and $\textbf{R}_2$. (b) The module BMLP-IE computes $\textbf{v}^*$, the closure of reaction products, using binary AND over rows and boolean matrix addition (ADD), multiplication (MUL) and equality (EQ) copilowish_matrix_1948.
  • Figure 3: tyrB isoenzyme function recovery frequency. The experimental space had $(\binom{33}{2} + 33) \times 7 = 3927$ instances for double gene-knockout synthetic data and single gene-knockout experimental data of key 33 genes in 7 conditions. The hypothesis space contained $(27 \times 32 + 6 \times 31) + 2 = 1052$ candidate gene-enzyme associations related to 27 single-function genes, 6 double-function genes, the tyrB original function and an empty hypothesis.

Theorems & Definitions (2)

  • definition 1: Boolean Matrix Logic Programming (BMLP) problem
  • theorem 1