Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
Yuanfang Xiang, Lun Ai
TL;DR
ALIGNED addresses the challenge of predicting genetic perturbation responses by unifying data-driven learning with symbolic gene regulatory knowledge through Abductive Learning, enabling end-to-end neuro-symbolic alignment and systematic knowledge refinement. A new Balanced Consistency Metric jointly evaluates accuracy against data and agreement with the knowledge base, and a gradient-free adaptor plus gradient-based refinement mechanism iteratively improve predictions and update the GRN. Across benchmark datasets and bacterial genome experiments, ALIGNED achieves higher balanced consistency and re-discovers biologically meaningful interactions, demonstrating improved interpretability and continual knowledge evolution. The approach holds promise for transparent, knowledge-guided predictions in complex cellular systems and can extend to other biological networks and tasks.
Abstract
The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.
