Geometry and universal scaling of Pareto-optimal signal compression

Jonas Berx

Geometry and universal scaling of Pareto-optimal signal compression

Jonas Berx

Abstract

I investigate the generic problem of lossy compression of a fluctuating stochastic signal $X$ into a discrete representation $Z$ through optimal thresholding. The signal modulates transition rates of a two-state system described by a binary variable $Y$. Optimising the retained mutual information between $Z$ and $Y$ under a constraint on fixed encoding cost of $Z$ reveals Pareto-optimal trade-offs, determined numerically using genetic algorithms. In the small-noise regime, these fronts are either concave or exhibit piecewise convex ``intrusions'' separated by first-order transitions in the optimal protocol. An analytical high-rate expansion shows that the optimal threshold density follows a universal cube-root scaling with the product of the prior distribution and the Fisher information associated with the response, which holds qualitatively even for few discrete states. Extending the analysis to non-Gaussian fluctuations reveals that for some parameters optimal encoders can yield strictly better information-cost trade-offs than Gaussian surrogates, meaning the same information content can often be achieved with fewer discrete readout states.

Geometry and universal scaling of Pareto-optimal signal compression

Abstract

I investigate the generic problem of lossy compression of a fluctuating stochastic signal

into a discrete representation

through optimal thresholding. The signal modulates transition rates of a two-state system described by a binary variable

. Optimising the retained mutual information between

and

under a constraint on fixed encoding cost of

reveals Pareto-optimal trade-offs, determined numerically using genetic algorithms. In the small-noise regime, these fronts are either concave or exhibit piecewise convex ``intrusions'' separated by first-order transitions in the optimal protocol. An analytical high-rate expansion shows that the optimal threshold density follows a universal cube-root scaling with the product of the prior distribution and the Fisher information associated with the response, which holds qualitatively even for few discrete states. Extending the analysis to non-Gaussian fluctuations reveals that for some parameters optimal encoders can yield strictly better information-cost trade-offs than Gaussian surrogates, meaning the same information content can often be achieved with fewer discrete readout states.

Paper Structure

This paper contains 11 equations, 4 figures.

Figures (4)

Figure 1: (a) Single realisation of the steady-state Gaussian input signal fluctuations $\mathcal{U}$ as a function of time. Dashed vertical lines indicate possible scenarios in which the fluctuation is either negative (left) or positive (right) with respect to the steady-state mean (dashed black line). (b) These input signals modulate transition rates $k^\pm$ between the states $Y\in\{0,1\}$ in a quasi-potential landscape $\Delta G$. (c) The Gaussian fluctuations $P_\mathcal{U}(u)$ (black, scaled by $\sqrt{2\pi}$) superimposed on the response curve $w(u)$ (light green). Steeper (dashed green) and shifted (dark green) response functions are also shown.
Figure 2: Pareto-optimal trade-offs for optimal binning with Gaussian prior for $\kappa = 0$ (a-d) and $\kappa=3$ (e-h), varying the sensitivity $\lambda$ (green: $\lambda =3$, purple: $\lambda =1$, black: $\lambda =1/2$). (a, e) Pareto fronts show that for $\kappa=0$ and $\lambda > 1$, the optimal trade-off features sharp 'corners' (open circles) indicating stable encoding choices. These corners seem absent when $\lambda \to 0$ or for $|\kappa|>0$. The maximum information $I(X,Y)$ is shown by the coloured dashed lines, with the gray region representing the unachievable bound $I(Z,Y) = H(Z)$. (b, f) The convex hull of the fronts traces phase transitions between optimal bin numbers $k$, which may correspond to non-uniform bin edges. (c, g) Optimal bin edges (with open circles from Lloyd's algorithm) and (d, h) the resulting bin allocation for $M=5$ (shaded regions) are displayed for $\lambda=3$. Optimal allocation is compared with the predicted edge density \ref{['eq:scaling_law']} (full lines) and response curves \ref{['eq:response_curve']} (dashed lines). Vertical gridlines in (a, c, e, g) and horizontal ones in (b, f) denote the values of $H_k = \log_2{k}$.
Figure 3: Information-cost Pareto fronts for non-Gaussian prior, with (a)$\lambda=3,\,\kappa=0$ and (b)$\lambda=1,\,\kappa=3$. For $\kappa=0$ the Gaussian limit results in a globally more optimal trade-off. Conversely, higher $|\kappa|$ can result in finite $\mu$ becoming the optimal curve.
Figure 4: Numerically computed Pareto-optimal distribution of bin edges (thin vertical lines) for $M=20$, superimposed on the priors $P_\mathcal{U}$ (full lines) and response functions $w(u)$ (dashed lines). The top row of figures shows the corresponding bin edge density $\rho(u)$\ref{['eq:scaling_law']}, which agrees well with the numerical results. (a) Gaussian prior $\mu\rightarrow\infty$ with $\lambda=3,\,\kappa=0$; (b) Laplacian prior $\mu=1$ with $\lambda=\kappa=2$; (c) product-normal prior $\mu=1/2$ with $\lambda=1,\,\kappa=3$.