Table of Contents
Fetching ...

Quantifying Broken Detailed Balance in Transcription

James Holehouse

TL;DR

This study derives exact analytic expressions for the mesoscopic entropy production rate $\dot{s}_{\mathrm{mes}}$ in the canonical two-state telegraph model of transcription and applies them to thousands of genes across seven datasets. The key result is a closed-form EPR, $\dot{s}_{\mathrm{mes}}=\frac{(\rho_{\mathrm{on}}-\rho_{\mathrm{off}})\sigma_{\mathrm{on}}\sigma_{\mathrm{off}}}{(\sigma_{\mathrm{on}}+\sigma_{\mathrm{off}})(d+\sigma_{\mathrm{on}}+\sigma_{\mathrm{off}})} \ln\left(\frac{\rho_{\mathrm{on}}}{\rho_{\mathrm{off}}}\right)$, with an alternative burst-size form using $B=\rho_{\mathrm{on}}/\sigma_{\mathrm{off}}$. Across seven real datasets, genes tend to occupy parameter regions with modest $\dot{s}_{\mathrm{mes}}$, suggesting a mesoscopic energy-expenditure minimization, though the mesoscopic bound is not a tight thermodynamic bound. The work also shows how coarse-graining can mask irreversibility, and how extrinsic noise and cell-to-cell variability can alter population-level irreversibility, highlighting the need for careful interpretation of single-cell vs population data in transcription thermodynamics.

Abstract

For the canonical two-state model of transcription, we derive exact analytic expressions for the entropy production rate of transcription at steady state, and assess detailed balance breaking in transcription. Our analytics allow us to easily evaluate the entropy production rate of thousands of genes across seven datasets of two-state model parameters without needing to evaluate the entropy production rate from trajectory-based computation. A data-driven approach then exposes that most genes avoid parameter regimes associated with large entropy production rates, akin to a mesoscopic version of energy expenditure minimization. Importantly, we show that this is not a thermodynamic phenomenon, since the entropy production rate from the two state gene model provides only a weak bound on the housekeeping energy needed to power transcription. Finally, we show that cell-to-cell variability can make mRNA expression seem more or less irreversible than a ``representative cell'' would imply.

Quantifying Broken Detailed Balance in Transcription

TL;DR

This study derives exact analytic expressions for the mesoscopic entropy production rate in the canonical two-state telegraph model of transcription and applies them to thousands of genes across seven datasets. The key result is a closed-form EPR, , with an alternative burst-size form using . Across seven real datasets, genes tend to occupy parameter regions with modest , suggesting a mesoscopic energy-expenditure minimization, though the mesoscopic bound is not a tight thermodynamic bound. The work also shows how coarse-graining can mask irreversibility, and how extrinsic noise and cell-to-cell variability can alter population-level irreversibility, highlighting the need for careful interpretation of single-cell vs population data in transcription thermodynamics.

Abstract

For the canonical two-state model of transcription, we derive exact analytic expressions for the entropy production rate of transcription at steady state, and assess detailed balance breaking in transcription. Our analytics allow us to easily evaluate the entropy production rate of thousands of genes across seven datasets of two-state model parameters without needing to evaluate the entropy production rate from trajectory-based computation. A data-driven approach then exposes that most genes avoid parameter regimes associated with large entropy production rates, akin to a mesoscopic version of energy expenditure minimization. Importantly, we show that this is not a thermodynamic phenomenon, since the entropy production rate from the two state gene model provides only a weak bound on the housekeeping energy needed to power transcription. Finally, we show that cell-to-cell variability can make mRNA expression seem more or less irreversible than a ``representative cell'' would imply.
Paper Structure (6 sections, 35 equations, 24 figures)

This paper contains 6 sections, 35 equations, 24 figures.

Figures (24)

  • Figure 1: Schematics of the models considered in this study. (a) Illustration of a eukaryotic cell with the most important molecules and processes labeled. The gray box shows the model of transcription considered in this study. (b) Typical modeling and inference schemes assume populations of identical cells, and that most of cellular noise is intrinsic. A population of heterogeneous cells including varying features such as cell size or activator concentrations, leading to distributions of kinetic parameters over the populations lim2015quantitativegrima2023quantifying.
  • Figure 2: Markov state diagram for the reaction scheme in Eq. \ref{['eq:modTM']} for states up to mRNA number $n=3$. Arrow labels indicate the propensity at which transitions between each state occur. Since every reaction can occur in both directions, the telegraph model is a dynamically reversible non-equilibrium system. Generally, detailed balance is not satisfied.
  • Figure 3: Exploring the non-equilibrium steady state of the two-state model. (a) Plots of $P_0(n)$, $P_1(n)$ and $P(n)$ calculated using the generating functions in the main text for parameters $\rho_{\mathrm{off}} = 1/3, \rho_{\mathrm{on}}=30, \sigma_{\mathrm{off}} = \sigma_{\mathrm{on}} = 1/2,$ and $d = 1$. (b) A diagram showing the flow of probability flux between states in the Markov diagram. Variables $b_n$ and $a_n$ represent probability fluxes, whose expressions are given in the main text. Here $b_n>0$, whereas the $a_n$ can be positive or negative.
  • Figure 4: Investigating detailed balance breaking via $\dot s_{\mathrm{mes}}$ between real parameters sets and null models 1 and 2 for G1 cell-cycle stage mouse fibroblasts in sukys2025cell. (a) Histogram over $\dot s_{\mathrm{mes}}$ for 1,436 real parameters sets (in gray), 1,436 parameter sets under null model 1 (in blue), and 1,436 parameter sets under null model 2 (in pink). Dashed lines show the respective mean values for each histogram. (b) Plot of $\dot s_{\mathrm{mes}}$ versus $\sigma_{\mathrm{off}}/\sigma_{\mathrm{on}}$ for real parameter sets. Each marker represents the position of a single gene in the phase space, and the $r$ value in the plot title is Pearson's correlation coefficient between the logarithm of the $x$ and $y$ variables. (c)-(d) Plots of $\dot s_{\mathrm{mes}}$ versus $\sigma_{\mathrm{off}}/\sigma_{\mathrm{on}}$ under null models 1 and 2 respectively. (e) Plot of $\dot s_{\mathrm{mes}}$ versus $CV^2$ for real parameter sets. (f)-(g) Plots of $\dot s_{\mathrm{mes}}$ versus $CV^2$ under null models 1 and 2 respectively.
  • Figure 5: Assessing the dependence of $\langle\dot s_{\mathrm{mes}}\rangle$ against real kinetic parameter combinations across 1,436 genes for G1 cell-cycle stage mouse fibroblasts in sukys2025cell. Here, $\langle \dot s_{\mathrm{mes}}\rangle$ is the mean EPR for binned values of real kinetic parameters sets. In all plots, pink lines correspond to the mode of the frequency distribution over a given kinetic parameter. (a) Binning genes across $\log_{10}(B)$ shows an increasing dependence of $\langle \dot s_{\mathrm{mes}}\rangle$ on $\log_{10}(B)$. The modal value of $\log_{10}(B)$ occurs well before the rapid increase of $\langle \dot s_{\mathrm{mes}}\rangle$. (b) Binning genes across $\log_{10}(\sigma_{\mathrm{on}})$, with similar conclusions as for plot (a). (c) Binning genes across $\log_{10}(\sigma_{\mathrm{off}})$ shows a surprising feature---that the modal values of $\log_{10}(\sigma_{\mathrm{off}})$ occur before and after the peak in $\langle \dot s_{\mathrm{mes}}\rangle$. (d) For $\log_{10}(\sigma_{\mathrm{off}}/\sigma_{\mathrm{on}})$ similar conclusions are reached as for panel (c).
  • ...and 19 more figures