Table of Contents
Fetching ...

SVD-based Causal Emergence for Gaussian Iterative Systems

Kaiwei Liu, Linli Pan, Zhipeng Wang, Mingzhe Yang, Bing Yuan, Jiang Zhang

TL;DR

The paper addresses the challenge of identifying causal emergence without prespecifying coarse-grainings by introducing an SVD-based CE framework for Gaussian iterative systems. It defines approximate dynamical reversibility through the inverse-covariance spectra $\Sigma^{-1}$ and $A^T\Sigma^{-1}A$, and shows that CE can be quantified directly as $\Delta\Gamma_α(ε)$ while correlating analytically with EI-based CE. The authors prove an approximate linear relation $\Delta\Gamma_α(ε) \approx (1-\alpha/4)\Delta\mathcal{J}^*$ and provide a robust, two-round SVD coarse-graining procedure to derive macro-dynamics from the singular-value structure. They validate the theory across linear GIS, a nonlinear extension via local linearization, and a neural-network–driven SIR model, illustrating practical applicability to continuous-state systems with Gaussian noise and emphasizing covariance and reversibility over dynamical functions. Overall, the work offers a scalable, interpretable CE diagnostic for continuous systems and establishes a bridge between SVD-based and EI-based CE notions with potential for real-world data analysis, including meteorology and neuroscience.

Abstract

Causal emergence (CE) based on effective information (EI) demonstrates that macro-states can exhibit stronger causal effects than micro-states in dynamics. However, the identification of CE and the maximization of EI both rely on coarse-graining strategies, which is a key challenge. A recently proposed CE framework based on approximate dynamical reversibility, utilizing singular value decomposition (SVD), is independent of coarse-graining. Still, it is limited to transition probability matrices (TPM) in discrete states. To address this, this article proposes a novel CE quantification framework for Gaussian iterative systems (GIS), based on approximate dynamical reversibility derived from the SVD of inverse covariance matrices in forward and backward dynamics. The positive correlation between SVD-based and EI-based CE, along with the equivalence condition, is given analytically. After that, we provide precise coarse-graining strategies directly from singular value spectra and orthogonal matrices. This new framework can be applied to any dynamical system with continuous states and Gaussian noise, such as auto-regressive growth models, Markov-Gaussian systems, and even SIR modeling using neural networks (NN). Numerical simulations on typical cases validate our theory and offer a new approach to studying the CE phenomenon, emphasizing noise and covariance over dynamical functions in both known models and machine learning.

SVD-based Causal Emergence for Gaussian Iterative Systems

TL;DR

The paper addresses the challenge of identifying causal emergence without prespecifying coarse-grainings by introducing an SVD-based CE framework for Gaussian iterative systems. It defines approximate dynamical reversibility through the inverse-covariance spectra and , and shows that CE can be quantified directly as while correlating analytically with EI-based CE. The authors prove an approximate linear relation and provide a robust, two-round SVD coarse-graining procedure to derive macro-dynamics from the singular-value structure. They validate the theory across linear GIS, a nonlinear extension via local linearization, and a neural-network–driven SIR model, illustrating practical applicability to continuous-state systems with Gaussian noise and emphasizing covariance and reversibility over dynamical functions. Overall, the work offers a scalable, interpretable CE diagnostic for continuous systems and establishes a bridge between SVD-based and EI-based CE notions with potential for real-world data analysis, including meteorology and neuroscience.

Abstract

Causal emergence (CE) based on effective information (EI) demonstrates that macro-states can exhibit stronger causal effects than micro-states in dynamics. However, the identification of CE and the maximization of EI both rely on coarse-graining strategies, which is a key challenge. A recently proposed CE framework based on approximate dynamical reversibility, utilizing singular value decomposition (SVD), is independent of coarse-graining. Still, it is limited to transition probability matrices (TPM) in discrete states. To address this, this article proposes a novel CE quantification framework for Gaussian iterative systems (GIS), based on approximate dynamical reversibility derived from the SVD of inverse covariance matrices in forward and backward dynamics. The positive correlation between SVD-based and EI-based CE, along with the equivalence condition, is given analytically. After that, we provide precise coarse-graining strategies directly from singular value spectra and orthogonal matrices. This new framework can be applied to any dynamical system with continuous states and Gaussian noise, such as auto-regressive growth models, Markov-Gaussian systems, and even SIR modeling using neural networks (NN). Numerical simulations on typical cases validate our theory and offer a new approach to studying the CE phenomenon, emphasizing noise and covariance over dynamical functions in both known models and machine learning.

Paper Structure

This paper contains 55 sections, 9 theorems, 149 equations, 9 figures.

Key Result

Corollary Appendix B.1

For the linear stochastic iteration systems like Equation (micro-dynamics-appendix), maximum effective information of the system after coarse-graining $y_t=\phi(x_t)=Wx_t$, $W\in \mathcal{R}^{r\times n}$, is calculated as

Figures (9)

  • Figure 1: For 1-dimensional Gaussian information system $x_{t+1}=ax_t+\eta_t, \eta_t\sim\mathcal{N}(0,\sigma^2)$, we examine the dynamics under three different parameter settings and two distinct intervention distributions of $p(x_t)$, showing the resulting distributions $p(x_{t+1})$ and the corresponding measures of dimension averaged causal effect ($\mathcal{J}$) and approximate reversible information ($\gamma\equiv\gamma_1$). For any given probability density $p(x_t)$ at time $t$, the conditional probability $p(x_{t+1}|x_t)=\mathcal{N}(ax_t, \Sigma)$ corresponding to the forward dynamics gives the probability density $p(x_{t+1})$ at the next time step $t+1$. In (a) and (d), when we stipulate $a=1.5,\sigma=0.2$, the dimensionally averaged EI $\mathcal{J}=0.596$ and the dimensional average reversible information $\gamma=1.812$ are satisfied. In (b) and (e), we decrease $\sigma$ to $\sigma=0.1$. Due to the increase in determinism, the dimensionally averaged EI increases to $\mathcal{J}=1.289$, and the dimensional average reversible information increases to $\gamma=2.505$, indicating that the approximate reversibility and causal effect strength are increased simultaneously. Then increase $a$ in (c) (f) to $a=2$. Due to the increase in non-degeneracy, the dimensionally averaged EI increases to $\mathcal{J}=1.576$, and the dimensional average reversible information increases to $\gamma=2.649$. From these simple cases, we can initially obtain three pieces of information: (1) $\mathcal{J}$ and $\gamma$ are only related to $p(x_{t+1}|x_t)$, and independent on the probability density $p(x_t)$ of $x_t$ can be defined arbitrarily; (2) $\mathcal{J}$ and $\gamma$ are both positively correlated with determinism and non-degeneracy, and increase synchronously with the increase of $a$ and decrease synchronously with the increase of $\sigma$; (3) $\mathcal{J}$ and $\gamma$ are positively correlated.
  • Figure 2: (a) For $p(x_{t+1}|x_t)=\mathcal{N}(Ax_t+a_0,\Sigma)$, the inverse covariance matrices of forward dynamics $x_{t+1}=a_0+Ax_t+\eta_t$ and backward dynamics $x_t=A^\dagger x_{t+1}-A^\dagger a_0-A^\dagger\eta_t$ are $\Sigma^{-1}$ and $A^T\Sigma^{-1}A$, respectively. $\Sigma^{-1}$ and $A^T\Sigma^{-1}A$ determine the approximate dynamical reversibility as $\Gamma_\alpha$ together. (b) $\alpha=1$ is often chosen to balance $\Gamma_\alpha$'s emphasis on both forward (determinism) and backward (non-degeneracy) dynamics. For $\alpha<1$, $\Gamma_\alpha$ tends to capture more information about backward dynamics $p(x_{t}|x_{t+1})$ (non-degeneracy). For $\alpha> 1$, $\Gamma_\alpha$ emphasizes more information about forward dynamics $p(x_{t+1}|x_{t})$ (determinism).
  • Figure 3: A case where vague CE occurs. $A$ and $\Sigma$ in (a) construct the GIS dynamics of a rotation model. In (b), the two singular values of the third dimension are less than the threshold $\epsilon=1.0$, so $\Delta\Gamma_\alpha(\epsilon)>0$ and vague CE occurs. In (c), the dynamic trajectory of macroscopic states in 3-dimensional Euclidean space $\mathcal{R}^3$ is shown. (d) and (e) correspond to the macroscopic dynamics trajectory in $\mathcal{R}^2$ obtained by the SVD method and the coarse-graining strategy based on which. (f) shows that as $A_{33}$, which represents the element of the 3rd row and 3rd column of $A$, increases. The strength of CE ($\Delta\Gamma_{\alpha}$) will decrease due to the decrease in the difference $s_2-s_3(>0)$ between singular values $s_2$ and $s_3$ of $A^T\Sigma^{-1}A$. As a comparison, the strength CE based on the maximizing EI framework ($\Delta\mathcal{J}$) also decreases similarly.
  • Figure 4: Two examples of dynamics where CE occurs, one's CE is primarily caused by the coefficient $A$, the other is primarily caused by the covariance $\Sigma$. (a) A Malthusian growth model where $x_3,x_4$ are copies of $x_1,x_2$. (b) A sample with growth rates of 0.2 and 0.05. $x_i$, $i=1,2,3,4$, are micro-states, while $y_1$ and $y_2$ are macro-states we derived by the method mentioned in Section \ref{['sce:causalemergencesvd']}. (c) The original parameter matrix $A$. (d) Parameter matrix $A$ with perturbations. (e) The original backward dynamics inverse covariance matrix $A^T\Sigma^{-1}A$ with rank $r_s=2$. (f) $A^T\Sigma^{-1}A$ after random perturbations to $A$. (g) The singular value spectrum of $A^T\Sigma^{-1}A$ and $\Sigma^{-1}$ when $A^T\Sigma^{-1}A$ only has two nonzero singular values. Clear CE occurs, and the strength can be calculated as $\Delta\Gamma_\alpha(0)=0.4034$. (h) The singular value spectrum of $A^T\Sigma^{-1}A$ and $\Sigma^{-1}$ when $A$ is perturbed. Vague CE occurs, and the strength of this CE is $\Delta\Gamma_\alpha(\epsilon)=0.4195$ when $\epsilon=2$. (i) The derived coarse-graining parameter $W$ according to the method mentioned in Section \ref{['Coarse-graining']} and Appendix E.2, where the number of columns represents the macroscopic dimensions, and the number of rows represents the microscopic dimensions. (j) The coarse-graining parameter $W$ for the vague CE case, the first macro-state dimension is determined by $x_1,x_3$, and the second macro-state dimension is determined by $x_2,x_4$. (k) The samples of trajectories of $x_t$ in Brownian motion. (l) Parameter $A$ of the drift term $f(x_t)$. (m) and (n) show singular matrices $U$ and $V$. (o) and (p) show the inverse of the covariance matrices of forward and backward dynamics, which are $\Sigma^{-1}$ and $A^T\Sigma^{-1}A$. (q) Singular value spectra of $\Sigma^{-1}$ and $A^T\Sigma^{-1}A$ with $\epsilon=0.6$. (r) Since $\Delta\Gamma_\alpha(\epsilon)$ is the theoretical value of CE, we can use this value to compare with the difference $\gamma_\alpha^W-\gamma_\alpha$ between the $\gamma^W_\alpha$ of our coarse-grained macro dynamics and the $\gamma_\alpha$ of the original micro dynamics. Approximate CE based on SVD $\Delta\Gamma_\alpha^W=\gamma_\alpha^W-\gamma_\alpha=0.4907$ as $W$ is derived from Section \ref{['Coarse-graining']}, which is close to the theoretical value $\Delta\Gamma_\alpha(\epsilon)=0.5167$. (s) The coarse-graining strategy parameter matrix $W$ derive from methods in Section \ref{['Coarse-graining']}, which preserves the non-conflicting 2nd and 3th dimensions, along with the 1st and 5th dimensions with larger singular values $\kappa_1$ and $s_1$.
  • Figure 5: An example of applying machine learning to extract dynamics from data and measure CE of the dynamical system. (a) Schematic diagram of the SIR model. (b) Schematic diagram of learning dynamics and calculating $\Gamma$ through NN based on data. (1) and (3) are the variables $x_t$ and predicted $x_{t+\Delta t}$ at adjacent time points. (2) NN model for machine learning, whose structure is in Appendix F.3. (4) Dynamics $f(x_t)$ learned by NN model based on data of SIR. (5) $A_{x_t}=\nabla f(x_t)$ is the Jacobian matrix of the trained model at $x_t$. (6) The output $L_\Sigma$ of the Covariance Learner Network, the covariance matrix $\Sigma_{x_t}$ is derived from which. (7) The numerical solution of $\Gamma$, which is related to $A$ and $\Sigma$. (8) Applying $A^T\Sigma^{-1}A=\left<A^T_{x_t}\Sigma^{-1}_{x_t}A_{x_t}\right>_{x_t\in\mathcal{X}}$ and $\Sigma^{-1}=\left<\Sigma^{-1}_{x_t}\right>_{x_t\in\mathcal{X}}$, SVD-based CE $\Delta\Gamma(\epsilon)$ can be calculated under suitable $\epsilon$, $\mathcal{X}$ is the domain of SIR. (c) Training data which is the same as NIS+ Yang2024. The full dataset (entire triangular region) used for training is displayed, along with four example trajectories with the same infection and recovery or death rates. The method of generating training data can be found in Appendix F.3. (d) $A^T\Sigma^{-1}A=\left<A^T_{x_t}\Sigma^{-1}_{x_t}A_{x_t}\right>_{x_t\in\mathcal{X}}$ derived by NN trained SIR model. (e) $\Sigma^{-1}=\left<\Sigma^{-1}_{x_t}\right>_{x_t\in\mathcal{X}}$ derived by NN trained SIR model. (f) Distribution of $r_\epsilon$ obtained by traverse the dynamical space of SIR with different $(S,I)$ and $x_t$ generated from which. (g) Distribution of $\Delta\Gamma_\alpha$ under different $(S,I)$ obtained by traverse the dynamical space. (h) The singular value spectra of $A^T\Sigma^{-1}A$ and $\Sigma^{-1}$ in NN trained SIR model. (i) The frequency of $r_\epsilon$ under different samples $x_t$ in test dataset. (j) Maximum $\Delta\Gamma_\alpha(\epsilon)=0.8685$ when the training period is around 50,000 as $\epsilon=5$. (k) The changing trend of CE under different $\sigma$, the threshold for trend change is around $\sigma=0.01$. (l) The coarse-graining matrix $W$ derived from Section \ref{['Coarse-graining']} and Appendix E.2 use $A^T\Sigma^{-1}A=\left<A^T_{x_t}\Sigma^{-1}_{x_t}A_{x_t}\right>_{x_t\in\mathcal{X}}$ and $\Sigma^{-1}=\left<\Sigma^{-1}_{x_t}\right>_{x_t\in\mathcal{X}}$. (m) The coarse-graining matrix $W_{NIS+}$ derived from NIS+. For the convenience of observation, we have taken absolute values for each dimension in $W$ and $W_{NIS+}$.
  • ...and 4 more figures

Theorems & Definitions (27)

  • Definition Appendix A.1
  • Definition Appendix A.2
  • Definition Appendix B.1
  • proof
  • Corollary Appendix B.1
  • Definition Appendix C.1
  • Lemma Appendix C.1
  • proof
  • Theorem Appendix C.1
  • proof
  • ...and 17 more