Table of Contents
Fetching ...

Guide-Guard: Off-Target Predicting in CRISPR Applications

Joseph Bingham, Netanel Arussy, Saman Zonouz

TL;DR

The paper addresses off-target safety in CRISPR-Cas13 workflows by introducing Guide-Guard, a convolutional neural network that classifies gRNA safety across multiple genes. It develops a data-preparation pipeline with 46 encoded nucleotides (23 guide, 23 target) and an 8-class labeling scheme, and designs a CNN architecture that achieves an overall accuracy of $84\%$ and an area under the ROC curve of $0.839$ with fast inference ($0.00055$ s per input) under 20-fold cross-validation. Key findings show that mismatch position, particularly at nucleotide $18$, and nucleotide replacements (notably G/C) strongly influence binding, and that weighting these features improves performance by up to $3.8\%$. The work emphasizes cyberbiosecurity implications, arguing that pre-use screening with Guide-Guard can mitigate risks from malicious or erroneous gRNA sequences, enabling safer, automated CRISPR workflows. The contribution lies in a practical, fast, multi-gene screening tool that can be integrated into gene-editing pipelines to reduce unsafe guides prior to synthesis or processing.

Abstract

With the introduction of cyber-physical genome sequencing and editing technologies, such as CRISPR, researchers can more easily access tools to investigate and create remedies for a variety of topics in genetics and health science (e.g. agriculture and medicine). As the field advances and grows, new concerns present themselves in the ability to predict the off-target behavior. In this work, we explore the underlying biological and chemical model from a data driven perspective. Additionally, we present a machine learning based solution named \textit{Guide-Guard} to predict the behavior of the system given a gRNA in the CRISPR gene-editing process with 84\% accuracy. This solution is able to be trained on multiple different genes at the same time while retaining accuracy.

Guide-Guard: Off-Target Predicting in CRISPR Applications

TL;DR

The paper addresses off-target safety in CRISPR-Cas13 workflows by introducing Guide-Guard, a convolutional neural network that classifies gRNA safety across multiple genes. It develops a data-preparation pipeline with 46 encoded nucleotides (23 guide, 23 target) and an 8-class labeling scheme, and designs a CNN architecture that achieves an overall accuracy of and an area under the ROC curve of with fast inference ( s per input) under 20-fold cross-validation. Key findings show that mismatch position, particularly at nucleotide , and nucleotide replacements (notably G/C) strongly influence binding, and that weighting these features improves performance by up to . The work emphasizes cyberbiosecurity implications, arguing that pre-use screening with Guide-Guard can mitigate risks from malicious or erroneous gRNA sequences, enabling safer, automated CRISPR workflows. The contribution lies in a practical, fast, multi-gene screening tool that can be integrated into gene-editing pipelines to reduce unsafe guides prior to synthesis or processing.

Abstract

With the introduction of cyber-physical genome sequencing and editing technologies, such as CRISPR, researchers can more easily access tools to investigate and create remedies for a variety of topics in genetics and health science (e.g. agriculture and medicine). As the field advances and grows, new concerns present themselves in the ability to predict the off-target behavior. In this work, we explore the underlying biological and chemical model from a data driven perspective. Additionally, we present a machine learning based solution named \textit{Guide-Guard} to predict the behavior of the system given a gRNA in the CRISPR gene-editing process with 84\% accuracy. This solution is able to be trained on multiple different genes at the same time while retaining accuracy.
Paper Structure (12 sections, 8 figures, 1 table)

This paper contains 12 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: A general model for genetic modification. The process on the user end may start with the collection of the DNA or RNA (1), or pulling down the information from a database (2) if the sequence has already been cataloged. This solution protects the boundaries between (1) and (2) as well as (2) and (3).
  • Figure 2: A pictorial representation of CRISPR Cas9 in action. Note, CRISPR Cas13 functions nearly identically, but on RNA instead.
  • Figure 3: A histogram of the binding potential where three mismatches occurs next to each other. The index given if the first mismatch.
  • Figure 4: A histogram detailing the binding potential as related to where a single mismatch occurs in the guide sequence.
  • Figure 6: A heat map detailing the binding potential as related to where two mismatches occur within a guide sequence, but may not be next to each other.
  • ...and 3 more figures