Guide-Guard: Off-Target Predicting in CRISPR Applications
Joseph Bingham, Netanel Arussy, Saman Zonouz
TL;DR
The paper addresses off-target safety in CRISPR-Cas13 workflows by introducing Guide-Guard, a convolutional neural network that classifies gRNA safety across multiple genes. It develops a data-preparation pipeline with 46 encoded nucleotides (23 guide, 23 target) and an 8-class labeling scheme, and designs a CNN architecture that achieves an overall accuracy of $84\%$ and an area under the ROC curve of $0.839$ with fast inference ($0.00055$ s per input) under 20-fold cross-validation. Key findings show that mismatch position, particularly at nucleotide $18$, and nucleotide replacements (notably G/C) strongly influence binding, and that weighting these features improves performance by up to $3.8\%$. The work emphasizes cyberbiosecurity implications, arguing that pre-use screening with Guide-Guard can mitigate risks from malicious or erroneous gRNA sequences, enabling safer, automated CRISPR workflows. The contribution lies in a practical, fast, multi-gene screening tool that can be integrated into gene-editing pipelines to reduce unsafe guides prior to synthesis or processing.
Abstract
With the introduction of cyber-physical genome sequencing and editing technologies, such as CRISPR, researchers can more easily access tools to investigate and create remedies for a variety of topics in genetics and health science (e.g. agriculture and medicine). As the field advances and grows, new concerns present themselves in the ability to predict the off-target behavior. In this work, we explore the underlying biological and chemical model from a data driven perspective. Additionally, we present a machine learning based solution named \textit{Guide-Guard} to predict the behavior of the system given a gRNA in the CRISPR gene-editing process with 84\% accuracy. This solution is able to be trained on multiple different genes at the same time while retaining accuracy.
