LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

Guanjin Wang; Junyu Xuan; Penghao Wang; Chengdao Li; Jie Lu

LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

Guanjin Wang, Junyu Xuan, Penghao Wang, Chengdao Li, Jie Lu

TL;DR

This study addresses barley genotype-to-phenotype prediction in high-dimensional genomic and environmental data. It introduces an LSTM autoencoder-based deep neural network that first pretrains an LSTM encoder on genomic data to learn latent representations, then uses a DNN predictor that combines these embeddings with environmental covariates to predict flowering time ($ZS49$) and grain yield ($GrYld$). The approach outperforms baselines such as XGBoost, MLP with/without genomic features, CNN, and LSTM variants, demonstrating the value of sequential modeling and genomic pretraining for complex trait prediction. The work has practical implications for breeding and management by enabling more accurate genotype-to-phenotype predictions in barley, with plans to extend to additional crops and time-series environmental data.

Abstract

Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize performance in various environments. There is increasing interest in using machine learning (ML) and deep learning (DL) algorithms for genotype-to-phenotype prediction due to their excellence in capturing complex interactions within large, high-dimensional datasets. In this work, we propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation, which could potentially help optimize yields and management practices. Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets and enhancing crop phenotype prediction performance.

LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

TL;DR

) and grain yield (

). The approach outperforms baselines such as XGBoost, MLP with/without genomic features, CNN, and LSTM variants, demonstrating the value of sequential modeling and genomic pretraining for complex trait prediction. The work has practical implications for breeding and management by enabling more accurate genotype-to-phenotype predictions in barley, with plans to extend to additional crops and time-series environmental data.

Abstract

Paper Structure (12 sections, 4 equations, 4 figures, 4 tables)

This paper contains 12 sections, 4 equations, 4 figures, 4 tables.

Introduction
Previous Work
Crop genotype-to-phenotype prediction
LSTM
Methodology
Genomic data encoding
Genotype-to-phenotype prediction
Experiments
Barley dataset
Experiment settings
Results
Conclusion

Figures (4)

Figure 1: Our LSTM autoencoder-based deep neural network framework
Figure 2: Impact of MLP Depth on predictive results (MAE): Left - ZS49, Right - GrYld
Figure 3: The impact of gene embedding dimension on predictive results (MAE): Left - ZS49, Right - GrYld
Figure 4: The impact of dimenision segment length on predictive results (MAE): Left - ZS49, Right - GrYld

LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

TL;DR

Abstract

LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)