Table of Contents
Fetching ...

Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction

Yuanfang Xiang, Lun Ai

TL;DR

ALIGNED addresses the challenge of predicting genetic perturbation responses by unifying data-driven learning with symbolic gene regulatory knowledge through Abductive Learning, enabling end-to-end neuro-symbolic alignment and systematic knowledge refinement. A new Balanced Consistency Metric jointly evaluates accuracy against data and agreement with the knowledge base, and a gradient-free adaptor plus gradient-based refinement mechanism iteratively improve predictions and update the GRN. Across benchmark datasets and bacterial genome experiments, ALIGNED achieves higher balanced consistency and re-discovers biologically meaningful interactions, demonstrating improved interpretability and continual knowledge evolution. The approach holds promise for transparent, knowledge-guided predictions in complex cellular systems and can extend to other biological networks and tasks.

Abstract

The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.

Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction

TL;DR

ALIGNED addresses the challenge of predicting genetic perturbation responses by unifying data-driven learning with symbolic gene regulatory knowledge through Abductive Learning, enabling end-to-end neuro-symbolic alignment and systematic knowledge refinement. A new Balanced Consistency Metric jointly evaluates accuracy against data and agreement with the knowledge base, and a gradient-free adaptor plus gradient-based refinement mechanism iteratively improve predictions and update the GRN. Across benchmark datasets and bacterial genome experiments, ALIGNED achieves higher balanced consistency and re-discovers biologically meaningful interactions, demonstrating improved interpretability and continual knowledge evolution. The approach holds promise for transparent, knowledge-guided predictions in complex cellular systems and can extend to other biological networks and tasks.

Abstract

The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.

Paper Structure

This paper contains 26 sections, 15 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Inconsistency between gene regulatory knowledge bases (KBs) and data-derived perturbation-responses correlations. We examined OmniPath turei_2016_omnipath, Gene Ontology (GO) ashburner_go_2000 and EcoCyc moore_ecocyc_2024 knowledge bases, human norman_2019 and bacterial (Precise1k, lamoureux_multi-scale_2023) datasets.
  • Figure 2: The ALIGNED (Adaptive aLignment of Inconsistent Genetic kNowledgE and Data) framework. ALIGNED contains a neural component (blue), a symbolic component (green) and an adaptor (purple).
  • Figure 3: Balanced, data and knowledge consistency across methods.
  • Figure 4: Performance of the complete ALIGNED framework built with GNN.
  • Figure 5: GRN knowledge refinement performance with ALIGNED.
  • ...and 3 more figures