DPCformer: An Interpretable Deep Learning Model for Genomic Prediction in Crops
Pengcheng Deng, Kening Liu, Mengxi Zhou, Mingxi Li, Rui Yang, Chuzhe Cao, Maojun Wang, Zeyu Zhang
TL;DR
DPCformer addresses the core challenges of genomic selection in crops by integrating CNN-based local SNP feature extraction with a multi-head self-attention transformer to model both intra- and inter-chromosomal genotype-phenotype relationships. It introduces an eight-dimensional SNP encoding, MIC-based feature selection, MAP-file chromosome segmentation, and a polyploid-aware processing pipeline for cotton, enabling robust predictions even in small-sample contexts. The architecture combines a Res-CNN per chromosome with a cross-chromosome Transformer and an MLP predictor, trained with MSE and validated via 10-fold CV across five crops and multiple traits, achieving state-of-the-art accuracy and providing interpretability via SHAP analyses that highlight biologically plausible candidate genes. These contributions offer a scalable, interpretable framework for precision breeding and have potential to accelerate genetic gains and global food security.
Abstract
Genomic Selection (GS) uses whole-genome information to predict crop phenotypes and accelerate breeding. Traditional GS methods, however, struggle with prediction accuracy for complex traits and large datasets. We propose DPCformer, a deep learning model integrating convolutional neural networks with a self-attention mechanism to model complex genotype-phenotype relationships. We applied DPCformer to 13 traits across five crops (maize, cotton, tomato, rice, chickpea). Our approach uses an 8-dimensional one-hot encoding for SNP data, ordered by chromosome, and employs the PMF algorithm for feature selection. Evaluations show DPCformer outperforms existing methods. In maize datasets, accuracy for traits like days to tasseling and plant height improved by up to 2.92%. For cotton, accuracy gains for fiber traits reached 8.37%. On small-sample tomato data, the Pearson Correlation Coefficient for a key trait increased by up to 57.35%. In chickpea, the yield correlation was boosted by 16.62%. DPCformer demonstrates superior accuracy, robustness in small-sample scenarios, and enhanced interpretability, providing a powerful tool for precision breeding and addressing global food security challenges.
