Machine Learning and Control: Foundations, Advances, and Perspectives
Enrique Zuazua
TL;DR
This work frames neural networks within a control-theoretic lens, unifying shallow and deep architectures as dynamical systems and studying training both as optimization and as ensemble/controllability problems. It develops a convex-relaxation approach to shallow networks that admits a no-gap solution and clarifies how width and tolerance influence generalization, while extending the view to ResNets and Neural ODEs to reveal depth-width tradeoffs and constructive controllability results. The analysis of transformers casts self-attention as a clustering mechanism, enabling exact interpolation with bounds that are independent of input length and highlighting a principled path to more efficient architectures. Collectively, the paper provides theoretical foundations connecting control theory, optimization, and PDEs with modern ML models, offering practical design principles and highlighting avenues for future work in diffusion models, federated learning, and hybrid physics-data modeling (HYCO).
Abstract
Control theory of dynamical systems offers a powerful framework for tackling challenges in deep neural networks and other machine learning architectures. We show that concepts such as simultaneous and ensemble controllability offer new insights into the classification and representation properties of deep neural networks, while the control and optimization of static systems can be employed to better understand the performance of shallow networks. Inspired by the classical concept of turnpike, we also explore the relationship between dynamic and static neural networks, where depth is traded for width, and the role of transformers as mechanisms for accelerating classical neural network tasks. We also exploit the expressive power of neural networks (exemplified, for instance, by the Universal Approximation Theorem) to develop a novel hybrid modeling methodology, the Hybrid-Cooperative Learning (HYCO), combining mechanics and data-driven methods in a game-theoretic setting. Finally, we describe how classical properties of diffusion processes, long established in the context of partial differential equations, contribute to explaining the success of modern generative artificial intelligence (AI). We present an overview of our recent results in these areas, illustrating how control, machine learning, numerical analysis, and partial differential equations come together to motivate a fertile ground for future research.
