Table of Contents
Fetching ...

Observability conditions for neural state-space models with eigenvalues and their roots of unity

Andrew Gracyk

TL;DR

This work investigates observability in neural state-space contexts, focusing on the Mamba architecture, by recasting observability in terms of ODE/control-theoretic concepts and Fourier-domain representations. It develops a suite of strategies to enforce observability that are tailored for high-dimensional, learnable latent states, including permutation-based designs with roots of unity, Fourier-transform–based conditions, and a Vandermonde-adapted Hautus test, along with a shared-parameter coupling that yields scalable exponentiation and Robbins-Monro–consistent training. Theoretical results demonstrate that observability can be achieved with high probability under structured matrix conditions and that the proposed training procedures can converge, while classical approaches may fail to contract in high-Lipschitz regimes. These contributions offer computationally efficient, provably observable neural state-space formulations, enabling reliable latent-state inference in long-horizon, high-dimensional settings with explicit control-theoretic guarantees. Collectively, the work bridges control theory and neural sequence modeling to enable scalable, observable latent dynamics with principled initialization and training dynamics.

Abstract

We operate through the lens of ordinary differential equations and control theory to study the concept of observability in the context of neural state-space models and the Mamba architecture. We develop strategies to enforce observability, which are tailored to a learning context, specifically where the hidden states are learnable at initial time, in conjunction to over its continuum, and high-dimensional. We also highlight our methods emphasize eigenvalues, roots of unity, or both. Our methods effectuate computational efficiency when enforcing observability, sometimes at great scale. We formulate observability conditions in machine learning based on classical control theory and discuss their computational complexity. Our nontrivial results are fivefold. We discuss observability through the use of permutations in neural applications with learnable matrices without high precision. We present two results built upon the Fourier transform that effect observability with high probability up to the randomness in the learning. These results are worked with the interplay of representations in Fourier space and their eigenstructure, nonlinear mappings, and the observability matrix. We present a result for Mamba that is similar to a Hautus-type condition, but instead employs an argument using a Vandermonde matrix instead of eigenvectors. Our final result is a shared-parameter construction of the Mamba system, which is computationally efficient in high exponentiation. We develop a training algorithm with this coupling, showing it satisfies a Robbins-Monro condition under certain orthogonality, while a more classical training procedure fails to satisfy a contraction with high Lipschitz constant.

Observability conditions for neural state-space models with eigenvalues and their roots of unity

TL;DR

This work investigates observability in neural state-space contexts, focusing on the Mamba architecture, by recasting observability in terms of ODE/control-theoretic concepts and Fourier-domain representations. It develops a suite of strategies to enforce observability that are tailored for high-dimensional, learnable latent states, including permutation-based designs with roots of unity, Fourier-transform–based conditions, and a Vandermonde-adapted Hautus test, along with a shared-parameter coupling that yields scalable exponentiation and Robbins-Monro–consistent training. Theoretical results demonstrate that observability can be achieved with high probability under structured matrix conditions and that the proposed training procedures can converge, while classical approaches may fail to contract in high-Lipschitz regimes. These contributions offer computationally efficient, provably observable neural state-space formulations, enabling reliable latent-state inference in long-horizon, high-dimensional settings with explicit control-theoretic guarantees. Collectively, the work bridges control theory and neural sequence modeling to enable scalable, observable latent dynamics with principled initialization and training dynamics.

Abstract

We operate through the lens of ordinary differential equations and control theory to study the concept of observability in the context of neural state-space models and the Mamba architecture. We develop strategies to enforce observability, which are tailored to a learning context, specifically where the hidden states are learnable at initial time, in conjunction to over its continuum, and high-dimensional. We also highlight our methods emphasize eigenvalues, roots of unity, or both. Our methods effectuate computational efficiency when enforcing observability, sometimes at great scale. We formulate observability conditions in machine learning based on classical control theory and discuss their computational complexity. Our nontrivial results are fivefold. We discuss observability through the use of permutations in neural applications with learnable matrices without high precision. We present two results built upon the Fourier transform that effect observability with high probability up to the randomness in the learning. These results are worked with the interplay of representations in Fourier space and their eigenstructure, nonlinear mappings, and the observability matrix. We present a result for Mamba that is similar to a Hautus-type condition, but instead employs an argument using a Vandermonde matrix instead of eigenvectors. Our final result is a shared-parameter construction of the Mamba system, which is computationally efficient in high exponentiation. We develop a training algorithm with this coupling, showing it satisfies a Robbins-Monro condition under certain orthogonality, while a more classical training procedure fails to satisfy a contraction with high Lipschitz constant.

Paper Structure

This paper contains 25 sections, 202 equations, 14 figures.

Figures (14)

  • Figure 1: We illustrate observability: ambient output states may be used to learn the initial hidden state. We focus on scenarios when this hidden state is learnable, i.e. not fixed, otherwise the learning task of observability is trivial.
  • Figure 2: This figures illustrates the Fourier kernel loss as in equation \ref{['eqn:fourier_kernel_loss']} is solvable, and that $\mathcal{L} = 0$ is attainable. We do so by systemically constructing vectors in the kernel, as is the method in the appendix. In (a), we depict $\mathcal{L}$ with ((x) training iteration mod 2 versus (y) loss). We have chosen $m=250, n=400$ here. We also chose $\text{positive constant} = 0.05$. In (b), we plot the learned eigenvalues in the complex plane. Color represents the norm.
  • Figure 3: This figure highlights a few important concepts. First, this figure illustrates the number of distinct kernels across $\Psi^j$ with respect to $j$, with similarity to the figures in the appendix. Higher is better, so our method clearly outperforms. The primary concept this figure illustrates is that the loss functions we develop in our Fourier-based theorems have significance, and they outperform eigenvalues that are simply random in ensuring the distinct kernel conditions hold. Here, green corresponds to the eigenvalues learned with our custom loss functions, while red corresponds to random eigenvalues. A second concept this figure illustrates is that we can control $\Delta$ predictably. In particular, the same $\Delta$ used in the loss function is the same $\Delta$ used to recreate these kernel results. We chose $\Delta = 0.1$, $n=40,m=25.$
  • Figure 4: We illustrate that Theorem 4 is empirically valid. (a) illustrates observability and the Fourier and eigenvalue loss as in equation \ref{['eqn:loss_thm4']} (b) illustrates observability loss with the observability matrix determinant. We choose $n=50, m=25$. We take the log of the determinant, which helps nondegeneracy from the accumulation of small values. If the observability matrix were low rank, the blue loss would be infinite, which we do not have (except near initialization only), so the system is observable. As we can see, loss achieves exactly zero from \ref{['eqn:loss_thm4']} because we use a relu-type loss.
  • Figure 5: In (a), regarding Theorem 4, as we can see, satisfying the distinct eigenvalues condition does not imply the distinctiveness of $\Psi$ with respect to $j$ is automatically satisfied. Thus, both of these loss terms of equation \ref{['eqn:loss_thm4']} are valuable in enforcing observability with high probability. In (b), we provide a Hautus loss condition and show that the minimum norm on a column of $CV$ is nonzero regarding Theorem 4 in a very simple state-space model setup with learnable initial hidden state. Here, $n=50,m=25$ as well. Again, this eigenvector condition is necessary, not sufficient, as we provided a counterexample in section \ref{['sec:hautus']}.
  • ...and 9 more figures