Table of Contents
Fetching ...

AgriVariant: Variant Effect Prediction using DeepChem-Variant for Precision Breeding in Rice

Ankita Vaishnobi Bisoi, Bharath Ramsundar

TL;DR

AgriVariant addresses the bottleneck of crop variant interpretation by integrating DeepChem-Variant-based variant calling with plant-specific annotation and a database-independent deleteriousness scoring framework. The pipeline targets rice stress-response genes (OsDREB2a, OsDREB1F, SKC1, OsMT-3a) and demonstrates accurate classification of variant effects, plus an exhaustive OsMT-3a mutational landscape across 1,509 possible variants in 10 days, vastly faster than wet-lab mutagenesis. The approach is fully open-source and modular within the DeepChem ecosystem, enabling adaptation to other crops with available reference genomes and annotations. By enabling rapid in silico variant prioritization, AgriVariant has the potential to accelerate precision breeding for climate resilience while reducing screening costs.

Abstract

Predicting functional consequences of genetic variants in crop genes remains a critical bottleneck for precision breeding programs. We present AgriVariant, an end-to-end pipeline for variant-effect prediction in rice (Oryza sativa) that addresses the lack of crop-specific variant-interpretation tools and can be extended to any crop species with available reference genomes and gene annotations. Our approach integrates deep learning-based variant calling (DeepChem-Variant) with custom plant genomics annotation using RAP-DB gene models and database-independent deleteriousness scoring that combines the Grantham distance and the BLOSUM62 substitution matrix. We validate the pipeline through targeted mutations in stress-response genes (OsDREB2a, OsDREB1F, SKC1), demonstrating correct classification of stop-gained, missense, and synonymous variants with appropriate HIGH / MODERATE / LOW impact assignments. An exhaustive mutagenesis study of OsMT-3a analyzed all 1,509 possible single-nucleotide variants in 10 days, identifying 353 high-impact, 447 medium-impact, and 709 low-impact variants - an analysis that would have required 2-4 years using traditional wet-lab approaches. This computational framework enables breeders to prioritize variants for experimental validation across diverse crop species, reducing screening costs and accelerating development of climate-resilient crop varieties.

AgriVariant: Variant Effect Prediction using DeepChem-Variant for Precision Breeding in Rice

TL;DR

AgriVariant addresses the bottleneck of crop variant interpretation by integrating DeepChem-Variant-based variant calling with plant-specific annotation and a database-independent deleteriousness scoring framework. The pipeline targets rice stress-response genes (OsDREB2a, OsDREB1F, SKC1, OsMT-3a) and demonstrates accurate classification of variant effects, plus an exhaustive OsMT-3a mutational landscape across 1,509 possible variants in 10 days, vastly faster than wet-lab mutagenesis. The approach is fully open-source and modular within the DeepChem ecosystem, enabling adaptation to other crops with available reference genomes and annotations. By enabling rapid in silico variant prioritization, AgriVariant has the potential to accelerate precision breeding for climate resilience while reducing screening costs.

Abstract

Predicting functional consequences of genetic variants in crop genes remains a critical bottleneck for precision breeding programs. We present AgriVariant, an end-to-end pipeline for variant-effect prediction in rice (Oryza sativa) that addresses the lack of crop-specific variant-interpretation tools and can be extended to any crop species with available reference genomes and gene annotations. Our approach integrates deep learning-based variant calling (DeepChem-Variant) with custom plant genomics annotation using RAP-DB gene models and database-independent deleteriousness scoring that combines the Grantham distance and the BLOSUM62 substitution matrix. We validate the pipeline through targeted mutations in stress-response genes (OsDREB2a, OsDREB1F, SKC1), demonstrating correct classification of stop-gained, missense, and synonymous variants with appropriate HIGH / MODERATE / LOW impact assignments. An exhaustive mutagenesis study of OsMT-3a analyzed all 1,509 possible single-nucleotide variants in 10 days, identifying 353 high-impact, 447 medium-impact, and 709 low-impact variants - an analysis that would have required 2-4 years using traditional wet-lab approaches. This computational framework enables breeders to prioritize variants for experimental validation across diverse crop species, reducing screening costs and accelerating development of climate-resilient crop varieties.
Paper Structure (24 sections, 1 equation, 7 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 1 equation, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: AgriVariant pipeline integrating deep learning-based variant calling, plant-specific functional annotation, and quantitative deleteriousness scoring.
  • Figure 2: OsDREB-mediated drought stress response pathway includes stress-induced gene activation, transcription factor production, downstream target gene regulation, and resulting drought tolerance.
  • Figure 3: SKC1-mediated salt stress response pathway. Salt stress activates SKC1 gene expression, producing HKT1;5 transporter protein that regulates Na+/K+ homeostasis by reducing sodium accumulation in shoots while maintaining potassium levels, resulting in salt tolerance.
  • Figure 4: OsMT-3a-mediated heavy metal stress response pathway. Heavy metal exposure (Cd, Cu, Zn) induces OsMT-3a expression, producing metallothionein protein that binds and sequesters toxic metal ions while scavenging reactive oxygen species, conferring heavy metal tolerance.
  • Figure 5: DeepChem-Variant workflow: Reference genome and aligned reads are processed through candidate detection, pileup image generation (6-channel tensors), CNN classification (MobileNetV2 or InceptionV3), and VCF conversion to produce annotated variant calls.
  • ...and 2 more figures