AgriVariant: Variant Effect Prediction using DeepChem-Variant for Precision Breeding in Rice
Ankita Vaishnobi Bisoi, Bharath Ramsundar
TL;DR
AgriVariant addresses the bottleneck of crop variant interpretation by integrating DeepChem-Variant-based variant calling with plant-specific annotation and a database-independent deleteriousness scoring framework. The pipeline targets rice stress-response genes (OsDREB2a, OsDREB1F, SKC1, OsMT-3a) and demonstrates accurate classification of variant effects, plus an exhaustive OsMT-3a mutational landscape across 1,509 possible variants in 10 days, vastly faster than wet-lab mutagenesis. The approach is fully open-source and modular within the DeepChem ecosystem, enabling adaptation to other crops with available reference genomes and annotations. By enabling rapid in silico variant prioritization, AgriVariant has the potential to accelerate precision breeding for climate resilience while reducing screening costs.
Abstract
Predicting functional consequences of genetic variants in crop genes remains a critical bottleneck for precision breeding programs. We present AgriVariant, an end-to-end pipeline for variant-effect prediction in rice (Oryza sativa) that addresses the lack of crop-specific variant-interpretation tools and can be extended to any crop species with available reference genomes and gene annotations. Our approach integrates deep learning-based variant calling (DeepChem-Variant) with custom plant genomics annotation using RAP-DB gene models and database-independent deleteriousness scoring that combines the Grantham distance and the BLOSUM62 substitution matrix. We validate the pipeline through targeted mutations in stress-response genes (OsDREB2a, OsDREB1F, SKC1), demonstrating correct classification of stop-gained, missense, and synonymous variants with appropriate HIGH / MODERATE / LOW impact assignments. An exhaustive mutagenesis study of OsMT-3a analyzed all 1,509 possible single-nucleotide variants in 10 days, identifying 353 high-impact, 447 medium-impact, and 709 low-impact variants - an analysis that would have required 2-4 years using traditional wet-lab approaches. This computational framework enables breeders to prioritize variants for experimental validation across diverse crop species, reducing screening costs and accelerating development of climate-resilient crop varieties.
