REML implementations of kernel-based genomic prediction models for genotype x environment x management interactions
Killian A. C. Melsen, Salvador Gezan, Daniel J. Tolhurst, Fred A. van Eeuwijk, Carel F. W. Peeters
TL;DR
This work provides REML-enabled, kernel-based genomic prediction models tailored for genotype-by-environment-by-management (GxExM) interactions in multi-environment trials. By implementing both linear and Gaussian kernels within standard mixed-model software and allowing environment-specific genetic variances, the authors demonstrate improved explanation of GxE variance and enhanced prediction, especially under sparse testing. The approach is validated on two real datasets (BRIWECS and DROPS), showing that nonlinear Gaussian kernels and heterogeneous variances yield higher accuracy than traditional main-effect or factor-analytic models. The framework facilitates integration of environmental covariables, phenomics, and genomics, enabling more flexible and scalable modeling of complex breeding datasets with potential extensions to multi-trait and high-throughput phenotyping data.
Abstract
High-throughput pheno-, geno-, and envirotyping allows characterization of plant genotypes and the trials they are evaluated in, producing different types of -omics data. These different data modalities can be integrated into statistical or machine learning models for genomic prediction in several ways. One commonly used approach within the analysis of multi-environment trial data in plant breeding is to create linear or nonlinear kernels which are subsequently used in linear mixed models (LMMs) to model genotype by environment (GxE) interactions. Current implementations of these kernel-based LMMs present a number of opportunities in terms of methodological extensions. Here we show how these models can be implemented in standard software, allowing direct restricted maximum likelihood (REML) estimation of all parameters. We also extend the models by combining the kernels with unstructured covariance matrices for three-way interactions in genotype by environment by management (GxExM) datasets, while simultaneously allowing for environment-specific genetic variances. We show how the models incorporating nonlinear kernels and heterogeneous variances maximize the amount of genetic variance captured by environmental covariables and perform best in prediction settings. We discuss the opportunities regarding models with multiple kernels or kernels obtained after environmental feature selection, as well as the similarities to models regressing phenotypes on latent and observed environmental covariables. Finally, we discuss the flexibility provided by our implementation in terms of modeling complex plant breeding datasets, allowing for straightforward integration of phenomics, enviromics, and genomics.
