Table of Contents
Fetching ...

State-space models through the lens of ensemble control

Ye Feng, Jianfeng Lu

Abstract

State-space models (SSMs) are effective architectures for sequential modeling, but a rigorous theoretical understanding of their training dynamics is still lacking. In this work, we formulate the training of SSMs as an ensemble optimal control problem, where a shared control law governs a population of input-dependent dynamical systems. We derive Pontryagin's maximum principle (PMP) for this ensemble control formulation, providing necessary conditions for optimality. Motivated by these conditions, we introduce an algorithm based on the method of successive approximations. We prove convergence of this iterative scheme along a subsequence and establish sufficient conditions for global optimality. The resulting framework provides a control-theoretic perspective on SSM training.

State-space models through the lens of ensemble control

Abstract

State-space models (SSMs) are effective architectures for sequential modeling, but a rigorous theoretical understanding of their training dynamics is still lacking. In this work, we formulate the training of SSMs as an ensemble optimal control problem, where a shared control law governs a population of input-dependent dynamical systems. We derive Pontryagin's maximum principle (PMP) for this ensemble control formulation, providing necessary conditions for optimality. Motivated by these conditions, we introduce an algorithm based on the method of successive approximations. We prove convergence of this iterative scheme along a subsequence and establish sufficient conditions for global optimality. The resulting framework provides a control-theoretic perspective on SSM training.
Paper Structure (9 sections, 7 theorems, 155 equations, 1 table, 1 algorithm)

This paper contains 9 sections, 7 theorems, 155 equations, 1 table, 1 algorithm.

Key Result

Theorem 3.1

Let $(u^*(\cdot), \{x^*(\cdot, \omega)\}_{\omega \in \Omega})$ be a $W^{1,1}$-local minimizer for problem (P). Suppose that there exists $\delta > 0$ such that Then there exists $p(\cdot, \cdot): [0,T] \times \Omega \to \mathbb{R}^n$ such that

Theorems & Definitions (15)

  • Remark 2.1
  • Definition 3.1
  • Theorem 3.1
  • Theorem 3.2
  • Remark 3.1
  • Lemma 4.1
  • Theorem 4.1
  • proof
  • Lemma 5.1
  • proof
  • ...and 5 more