Table of Contents
Fetching ...

Mutation Rate Variation Across Genomic Regions in \textit{Arabidopsis thaliana}

Elisa Heinrich-Mora, Marcus W. Feldman

TL;DR

The results indicate that mutation rate is systematically structured by chromatin state within functionally constrained genes and suggest that evolutionary processes may act not only on expected mutation rate but also on its variability across loci.

Abstract

In population genetics, mutation rate is often treated as a homogeneous parameter across the genome. Empirical evidence, however, shows systematic variation across genomic contexts associated with chromatin organization and epigenomic features. Using gene-level de novo mutation data from Arabidopsis thaliana, we test whether chromatin features predict not only the mean per-base mutation rate but also its variability across genes. To reduce heterogeneity in selective regime, we restrict analysis to essential and lethal loci subject to strong purifying selection. Across complementary multivariable models including heteroskedasticity-robust linear regression, length-weighted regression, and Poisson generalized linear models with exposure offsets, histone marks associated with active transcription (H3K4me1, H3K4me3, H3K36ac) are consistently associated with lower mean mutation rates and substantially reduced between-gene variance. GC content shows little association with the mean once chromatin predictors are controlled but is positively associated with mutation-rate variability. Estimates of skewness and kurtosis reveal no significant higher-order structure attributable to epigenomic predictors. A standardized Tajima's $D$ statistic yields directionally consistent but statistically underpowered associations with both the mean and variance of gene-level mutation rates. These results indicate that mutation rate is systematically structured by chromatin state within functionally constrained genes and suggest that evolutionary processes may act not only on expected mutation rate but also on its variability across loci.

Mutation Rate Variation Across Genomic Regions in \textit{Arabidopsis thaliana}

TL;DR

The results indicate that mutation rate is systematically structured by chromatin state within functionally constrained genes and suggest that evolutionary processes may act not only on expected mutation rate but also on its variability across loci.

Abstract

In population genetics, mutation rate is often treated as a homogeneous parameter across the genome. Empirical evidence, however, shows systematic variation across genomic contexts associated with chromatin organization and epigenomic features. Using gene-level de novo mutation data from Arabidopsis thaliana, we test whether chromatin features predict not only the mean per-base mutation rate but also its variability across genes. To reduce heterogeneity in selective regime, we restrict analysis to essential and lethal loci subject to strong purifying selection. Across complementary multivariable models including heteroskedasticity-robust linear regression, length-weighted regression, and Poisson generalized linear models with exposure offsets, histone marks associated with active transcription (H3K4me1, H3K4me3, H3K36ac) are consistently associated with lower mean mutation rates and substantially reduced between-gene variance. GC content shows little association with the mean once chromatin predictors are controlled but is positively associated with mutation-rate variability. Estimates of skewness and kurtosis reveal no significant higher-order structure attributable to epigenomic predictors. A standardized Tajima's statistic yields directionally consistent but statistically underpowered associations with both the mean and variance of gene-level mutation rates. These results indicate that mutation rate is systematically structured by chromatin state within functionally constrained genes and suggest that evolutionary processes may act not only on expected mutation rate but also on its variability across loci.
Paper Structure (20 sections, 26 equations, 3 figures)

This paper contains 20 sections, 26 equations, 3 figures.

Figures (3)

  • Figure 1: Partial associations for the stable predictor set (mean model). Partial-effect curves from the multivariable OLS model with HC3 robust inference, restricted to essential/lethal genes. Each curve shows the expected per--base mutation rate as a single standardized predictor varies within $[-2,2]$ standard deviations, while all other predictors are fixed at their standardized means (0). Shaded regions denote 95% heteroskedasticity-robust confidence intervals. Rug marks along the horizontal axis indicate the empirical distribution of each predictor within the plotted range; denser rugs correspond to regions with more genes and therefore stronger empirical support.
  • Figure 2: Predictors $\times$ moments of the per--base mutation rate (essential/lethal genes). Each cell reports the estimated regression coefficient for a given predictor (row) and moment (column). "Mean" shows partial effects on the expected per--base mutation rate from the multivariable OLS model. "Variance" reports the percent change in residual variance per one--standard--deviation increase in the predictor, based on a log-variance regression of $\log(\hat{\varepsilon}_i^2)$ on the same predictors. "Skewness" and "Kurtosis" report coefficients from regressions on standardized-residual moments ($u_i^3$ and $u_i^4-3$, respectively), where $u_i$ denotes residuals scaled by the fitted variance and clipped at the 0.5th--99.5th percentiles to limit outlier leverage. Colors represent column-wise standardized effect sizes (z-scores; for visualization only), while annotations show raw effect estimates. Stars denote Benjamini--Hochberg FDR significance within each moment. The strongest and most consistent pattern is variance suppression associated with active-chromatin marks (H3K4me1, H3K4me3, H3K36ac); effects on skewness and kurtosis are not statistically supported.
  • Figure 3: Selection proxy versus predictors and mutation-rate moments in essential/lethal genes. Selection intensity is indexed by $S = -z(\mathrm{Tajima's}\ D)$, so larger $S$ corresponds to more negative Tajima’s $D$ (i.e., stronger inferred purifying selection). The sign reversal is for interpretive clarity only. (A) Partial associations between each standardized predictor and $S$, from regressions $X_j^\ast \sim S + X_{-j}^\ast$ with heteroskedasticity-robust 95% confidence intervals. Lines show adjusted predictions (others fixed at 0); points are binned means $\pm$ SEM. (B) Predicted mean and higher-moment summaries from models including $S$ and chromatin predictors (others fixed at 0). Curves are standardized over the plotted $S$ range for display only. Shaded regions denote robust 95% confidence intervals.