Table of Contents
Fetching ...

LinkedNN: a neural model of linkage disequilibrium decay for recent effective population size inference

Chris C R Smith

TL;DR

A bioinformatics tool is presented for estimating recent effective population size by using a neural network to automatically compute linkage disequilibrium-related features as a function of genomic distance between polymorphisms, making it particularly valuable for molecular ecology applications with sparse, unphased data.

Abstract

Summary: A bioinformatics tool is presented for estimating recent effective population size by using a neural network to automatically compute linkage disequilibrium-related features as a function of genomic distance between polymorphisms. The new method outperforms existing deep learning and summary statistic-based approaches using relatively few sequenced individuals and variant sites, making it particularly valuable for molecular ecology applications with sparse, unphased data. Availability and implementation: The program is available as an easily installable Python package with documentation here: https://pypi.org/project/linkedNN/. The open source code is available from: https://github.com/the-smith-lab/LinkedNN.

LinkedNN: a neural model of linkage disequilibrium decay for recent effective population size inference

TL;DR

A bioinformatics tool is presented for estimating recent effective population size by using a neural network to automatically compute linkage disequilibrium-related features as a function of genomic distance between polymorphisms, making it particularly valuable for molecular ecology applications with sparse, unphased data.

Abstract

Summary: A bioinformatics tool is presented for estimating recent effective population size by using a neural network to automatically compute linkage disequilibrium-related features as a function of genomic distance between polymorphisms. The new method outperforms existing deep learning and summary statistic-based approaches using relatively few sequenced individuals and variant sites, making it particularly valuable for molecular ecology applications with sparse, unphased data. Availability and implementation: The program is available as an easily installable Python package with documentation here: https://pypi.org/project/linkedNN/. The open source code is available from: https://github.com/the-smith-lab/LinkedNN.
Paper Structure (17 sections, 1 equation, 2 figures, 1 table)

This paper contains 17 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: (Left) Neural network diagram. The inputs are genotypes for all SNP pairs $x_1, \dots, x_P$ and corresponding genomic distances $d_1, \dots, d_P$. The number of filters is $f=64$ and rectified linear unit (ReLU) activation is used on all trainable layers except the final layer. Radial basis functions (RBF) are applied to the raw distances. The output can be effective population size, $N_e$, or another target the user defines. (Top right). Evaluating the LD layer on 1,000 simulations held out from training. Axes are log-scale. The performance metric is mean relative absolute error (MRAE). (Bottom right). Blue lines are $f=64$ different coefficients output by the distance-mapping network of the pre-trained LD layer for a range of genomic distance inputs, $d_1, \dots, d_P$, irrespective of genotypes. Larger values indicate distances the model thinks are important for scaling particular genetic features. Black and grey dashed lines are binned $r^2$ values calculated on SNP pairs from five $N_e=10^2$ and five $N_e=10^4$ simulations, respectively. Bins for distances smaller than 1,000 bp were omitted because they contain too few SNPs.
  • Figure S1: Blue lines are $f=64$ different coefficients output by the distance-mapping network at initialization---without training---for a range of distance inputs.