Table of Contents
Fetching ...

Machine Learning and Control: Foundations, Advances, and Perspectives

Enrique Zuazua

TL;DR

This work frames neural networks within a control-theoretic lens, unifying shallow and deep architectures as dynamical systems and studying training both as optimization and as ensemble/controllability problems. It develops a convex-relaxation approach to shallow networks that admits a no-gap solution and clarifies how width and tolerance influence generalization, while extending the view to ResNets and Neural ODEs to reveal depth-width tradeoffs and constructive controllability results. The analysis of transformers casts self-attention as a clustering mechanism, enabling exact interpolation with bounds that are independent of input length and highlighting a principled path to more efficient architectures. Collectively, the paper provides theoretical foundations connecting control theory, optimization, and PDEs with modern ML models, offering practical design principles and highlighting avenues for future work in diffusion models, federated learning, and hybrid physics-data modeling (HYCO).

Abstract

Control theory of dynamical systems offers a powerful framework for tackling challenges in deep neural networks and other machine learning architectures. We show that concepts such as simultaneous and ensemble controllability offer new insights into the classification and representation properties of deep neural networks, while the control and optimization of static systems can be employed to better understand the performance of shallow networks. Inspired by the classical concept of turnpike, we also explore the relationship between dynamic and static neural networks, where depth is traded for width, and the role of transformers as mechanisms for accelerating classical neural network tasks. We also exploit the expressive power of neural networks (exemplified, for instance, by the Universal Approximation Theorem) to develop a novel hybrid modeling methodology, the Hybrid-Cooperative Learning (HYCO), combining mechanics and data-driven methods in a game-theoretic setting. Finally, we describe how classical properties of diffusion processes, long established in the context of partial differential equations, contribute to explaining the success of modern generative artificial intelligence (AI). We present an overview of our recent results in these areas, illustrating how control, machine learning, numerical analysis, and partial differential equations come together to motivate a fertile ground for future research.

Machine Learning and Control: Foundations, Advances, and Perspectives

TL;DR

This work frames neural networks within a control-theoretic lens, unifying shallow and deep architectures as dynamical systems and studying training both as optimization and as ensemble/controllability problems. It develops a convex-relaxation approach to shallow networks that admits a no-gap solution and clarifies how width and tolerance influence generalization, while extending the view to ResNets and Neural ODEs to reveal depth-width tradeoffs and constructive controllability results. The analysis of transformers casts self-attention as a clustering mechanism, enabling exact interpolation with bounds that are independent of input length and highlighting a principled path to more efficient architectures. Collectively, the paper provides theoretical foundations connecting control theory, optimization, and PDEs with modern ML models, offering practical design principles and highlighting avenues for future work in diffusion models, federated learning, and hybrid physics-data modeling (HYCO).

Abstract

Control theory of dynamical systems offers a powerful framework for tackling challenges in deep neural networks and other machine learning architectures. We show that concepts such as simultaneous and ensemble controllability offer new insights into the classification and representation properties of deep neural networks, while the control and optimization of static systems can be employed to better understand the performance of shallow networks. Inspired by the classical concept of turnpike, we also explore the relationship between dynamic and static neural networks, where depth is traded for width, and the role of transformers as mechanisms for accelerating classical neural network tasks. We also exploit the expressive power of neural networks (exemplified, for instance, by the Universal Approximation Theorem) to develop a novel hybrid modeling methodology, the Hybrid-Cooperative Learning (HYCO), combining mechanics and data-driven methods in a game-theoretic setting. Finally, we describe how classical properties of diffusion processes, long established in the context of partial differential equations, contribute to explaining the success of modern generative artificial intelligence (AI). We present an overview of our recent results in these areas, illustrating how control, machine learning, numerical analysis, and partial differential equations come together to motivate a fertile ground for future research.

Paper Structure

This paper contains 20 sections, 8 theorems, 40 equations, 4 figures.

Key Result

Theorem 2.1

When $P \ge N$, the solution sets of pb:NN_exact_rel and pb:NN_epsilon_rel, denoted respectively by $Spb:NN_exact_rel$ and $Spb:NN_epsilon_rel$, are nonempty, convex and compact in the weak-$*$ sense. Moreover, where $\textnormal{Ext}(S)$ represents the set of all extreme points of $S$, $S(P)$ the solution set of the corresponding optimization problem $P$ and $\textnormal{val}(P)$ the minimum val

Figures (4)

  • Figure 3.1: Basic movements generated by each of the neurons in the vector field determining the dynamics of \ref{['eq:node']}: $(a_i, b_i)$ determines the cutting hyperplanes defining the various regions, and $w_i$ specifies the direction of the "wind" on the active half-space, namely, the side where the neuron drives the dynamics. The blue dashed line represents the hyperplane $\langle a_i, x\rangle+b_i=0$. From left to right: Compression, parallel motion, and expansion, depending on the choice of $w_i$.
  • Figure 3.2: The Voronoi-type decomposition determined by the neural vector field at any time when involving several neurons $P \ge 2$. In the present case, $P=3$ neurons decompose the plane ($d=2$) into 7 regions where the neural vector field is oriented differently. In the figure on the right, we observe that the vector field vanishes within the lower-center Voronoi cell, thereby defining a region of space that remains stationary under the action of the vector field.
  • Figure 3.3: From left to right: Construction of a vector field whose integral curves interpolate the dataset, defined in a compact domain $\Omega$ containing all the curves.
  • Figure 4.1: Geometric interpretation of \ref{['eq:transformer_b']} for $i=1$ with (a) $A=I$ and (b) $A = \left(2111\right)$. In (a), tokens $z_{2}^{}$ and $z_{3}^{}$ have the largest orthogonal projection on the direction of $A z_{1}^{} = z_{1}^{}$, so $\mathcal{C}_1(Z) = \{ 2,3 \}$. In (b), token $z_{4}^{}$ has the largest projection on the direction of $Az_{1}^{}$, so $\mathcal{C}_1(Z) = \{ 4 \}$. In both cases, tokens attracting $z_1$ can only lie on the closed half-space $\mathcal{H}_1 = \{z:\;\langle A z_{1}^{}, z_{}^{} - z_{1}^{} \rangle \geq 0\}$ (blue shading).

Theorems & Definitions (8)

  • Theorem 2.1: No-Gap, liu2024representation
  • Theorem 2.2: Generalization bound, liu2024representation
  • Theorem 3.1: Simultaneous controllability, $P=1$, domenecNODES
  • Theorem 3.2: Universal approximation theorem, $P=1$, domenecNODES
  • Theorem 3.3: Depth versus width, alvlop2024interplay
  • Theorem 3.4: Approximate simultaneous control for autonomous NODEs, alvlop2024interplay
  • Theorem 3.5: $L^1$-approximate control of neural transport, DomenecNormalisingFlows
  • Theorem 4.1: Asymptotic clustering, alcalde2025clustering