$α$-divergence Improves the Entropy Production Estimation via Machine Learning

Euijoon Kwon; Yongjoo Baek

$α$-divergence Improves the Entropy Production Estimation via Machine Learning

Euijoon Kwon, Yongjoo Baek

TL;DR

Problem: estimating trajectory-level entropy production (EP) from stochastic trajectories. Approach: introduce the $α$-NEEP, replacing the KL-based loss of the original NEEP with a variational $α$-divergence loss $L_α$, forming a family parameterized by $α$. Key finding: for $-1 < α ≤ 0$, and especially at $α = -0.5$, the estimator is more robust to strong nonequilibrium driving and sampling noise. Rationale: a simple Gaussian model provides analytic insight showing why $α = -0.5$ minimizes bias in the loss landscape and gradient fluctuations. Impact: broadens applicability of EP estimation in nonequilibrium thermodynamics and guides loss design for trajectory-level stochastic thermodynamic quantities.

Abstract

Recent years have seen a surge of interest in the algorithmic estimation of stochastic entropy production (EP) from trajectory data via machine learning. A crucial element of such algorithms is the identification of a loss function whose minimization guarantees the accurate EP estimation. In this study, we show that there exists a host of loss functions, namely those implementing a variational representation of the $α$-divergence, which can be used for the EP estimation. By fixing $α$ to a value between $-1$ and $0$, the $α$-NEEP (Neural Estimator for Entropy Production) exhibits a much more robust performance against strong nonequilibrium driving or slow dynamics, which adversely affects the existing method based on the Kullback-Leibler divergence ($α= 0$). In particular, the choice of $α= -0.5$ tends to yield the optimal results. To corroborate our findings, we present an exactly solvable simplification of the EP estimation problem, whose loss function landscape and stochastic properties give deeper intuition into the robustness of the $α$-NEEP.

$α$-divergence Improves the Entropy Production Estimation via Machine Learning

TL;DR

Problem: estimating trajectory-level entropy production (EP) from stochastic trajectories. Approach: introduce the

-NEEP, replacing the KL-based loss of the original NEEP with a variational

-divergence loss

, forming a family parameterized by

. Key finding: for

, and especially at

, the estimator is more robust to strong nonequilibrium driving and sampling noise. Rationale: a simple Gaussian model provides analytic insight showing why

minimizes bias in the loss landscape and gradient fluctuations. Impact: broadens applicability of EP estimation in nonequilibrium thermodynamics and guides loss design for trajectory-level stochastic thermodynamic quantities.

Abstract

-divergence, which can be used for the EP estimation. By fixing

to a value between

and

, the

-NEEP (Neural Estimator for Entropy Production) exhibits a much more robust performance against strong nonequilibrium driving or slow dynamics, which adversely affects the existing method based on the Kullback-Leibler divergence (

). In particular, the choice of

tends to yield the optimal results. To corroborate our findings, we present an exactly solvable simplification of the EP estimation problem, whose loss function landscape and stochastic properties give deeper intuition into the robustness of the

-NEEP.

Paper Structure (12 sections, 27 equations, 9 figures)

This paper contains 12 sections, 27 equations, 9 figures.

Introduction
Overview of the Original NEEP
Formulation of the $\alpha$-NEEP
Examples
Simple Gaussian Model
Summary and outlook
Training details
Density ratio estimation via $f$-divergence
Extra numerical results
Coefficient of determination
Effects of the minibatch size
Effects of overfitting

Figures (9)

Figure 1: Schematic illustration of the neural-network implementation of the $\alpha$-NEEP.
Figure 2: (a) Illustration of the two-bead model. (b) Mean square error (MSE) of the EP estimate for various temperature differences. (c) Ratio between the estimated value $\sigma_\mathrm{pred}$ and the true value $\sigma$ of average EP for the two-bead model. Temperature of the cold bath is fixed at $T_\mathrm{c} = 1$. Each data point and error bar are obtained from $40$ independent trainings.
Figure 3: (a) Illustration of the Brownian gyrator. Circles represent the equipotential lines and the dashed arrows indicate the directions of the nonconservative driving. (b) MSE of the EP estimate for the Brownian gyrator model as the magnitude of nonconservative force, $\varepsilon = -\delta$, is varied. (c) Ratio between the estimated value $\sigma_\mathrm{pred}$ and the true value $\sigma$ of average EP for the Brownian gyrator. Temperatures are fixed at $T_\mathrm{h} = 10$ and $T_\mathrm{c} = 1$. Each data point and error bar are obtained from $40$ independent trainings.
Figure 4: (a) Illustration of the driven Brownian particle. (b) MSE of the EP estimate for the driven Brownian particle as the potential depth $A$ is varied. (c) Ratio between the estimated value $\sigma_\mathrm{pred}$ and the true value $\sigma$ of the average EP for the driven Brownian particle. Strength of the nonequilibrium driving is fixed at $f = 32$ and the temperature at $T = 1$. Each data point and error bar are obtained from $40$ independent trainings.
Figure 5: Performance of the exactly solvable one-parameter model. (a) Shift $\Delta \theta$ of the loss function minimum as a function of the truncation parameter $k$. Circles are results obtained by numerical minimization, and solid lines are from the small $1/k$ expansion. (Inset) Loss function landscapes, with circles indicating the minima. We fixed $\mu=3$, $\sigma=1$, and $\alpha=0$. (b) Ratio of the estimated minimum $\theta^*$ to the true minimum $\theta_0$ as the bias $\mu$ is varied. The optimal points are calculated using the criterion that the loss function gradient satisfies $|\partial_\theta \mathcal{L}_\alpha(\theta)| < 10^{-3}$ for the first time as $\theta$ increases from $0$. We fixed $k=4$ and $\sigma=1$. (Inset) Loss function landscape. Open diamonds indicate the true minima $\theta_0$, and the filled diamonds represent the estimated minima $\theta^*$. The parameters $\alpha = -0.5$ and $k=4$ are fixed. (c) MSE of $\theta$. The vertical dashed line shows that the error is minimized at $\alpha=-0.5$. (d) Distribution of the loss function gradient $\partial_\theta \mathcal{L}_\alpha$ at the minimum $\theta_0 = 2$ for $\mu = \sigma = 1$.
...and 4 more figures

$α$-divergence Improves the Entropy Production Estimation via Machine Learning

TL;DR

Abstract

$α$-divergence Improves the Entropy Production Estimation via Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)