Table of Contents
Fetching ...

LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

Guanjin Wang, Junyu Xuan, Penghao Wang, Chengdao Li, Jie Lu

TL;DR

This study addresses barley genotype-to-phenotype prediction in high-dimensional genomic and environmental data. It introduces an LSTM autoencoder-based deep neural network that first pretrains an LSTM encoder on genomic data to learn latent representations, then uses a DNN predictor that combines these embeddings with environmental covariates to predict flowering time ($ZS49$) and grain yield ($GrYld$). The approach outperforms baselines such as XGBoost, MLP with/without genomic features, CNN, and LSTM variants, demonstrating the value of sequential modeling and genomic pretraining for complex trait prediction. The work has practical implications for breeding and management by enabling more accurate genotype-to-phenotype predictions in barley, with plans to extend to additional crops and time-series environmental data.

Abstract

Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize performance in various environments. There is increasing interest in using machine learning (ML) and deep learning (DL) algorithms for genotype-to-phenotype prediction due to their excellence in capturing complex interactions within large, high-dimensional datasets. In this work, we propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation, which could potentially help optimize yields and management practices. Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets and enhancing crop phenotype prediction performance.

LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype Prediction

TL;DR

This study addresses barley genotype-to-phenotype prediction in high-dimensional genomic and environmental data. It introduces an LSTM autoencoder-based deep neural network that first pretrains an LSTM encoder on genomic data to learn latent representations, then uses a DNN predictor that combines these embeddings with environmental covariates to predict flowering time () and grain yield (). The approach outperforms baselines such as XGBoost, MLP with/without genomic features, CNN, and LSTM variants, demonstrating the value of sequential modeling and genomic pretraining for complex trait prediction. The work has practical implications for breeding and management by enabling more accurate genotype-to-phenotype predictions in barley, with plans to extend to additional crops and time-series environmental data.

Abstract

Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize performance in various environments. There is increasing interest in using machine learning (ML) and deep learning (DL) algorithms for genotype-to-phenotype prediction due to their excellence in capturing complex interactions within large, high-dimensional datasets. In this work, we propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation, which could potentially help optimize yields and management practices. Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets and enhancing crop phenotype prediction performance.
Paper Structure (12 sections, 4 equations, 4 figures, 4 tables)

This paper contains 12 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Our LSTM autoencoder-based deep neural network framework
  • Figure 2: Impact of MLP Depth on predictive results (MAE): Left - ZS49, Right - GrYld
  • Figure 3: The impact of gene embedding dimension on predictive results (MAE): Left - ZS49, Right - GrYld
  • Figure 4: The impact of dimenision segment length on predictive results (MAE): Left - ZS49, Right - GrYld