Bounds on Deep Neural Network Partial Derivatives with Respect to Parameters
Omkar Sudhir Patil, Brandon C. Fallin, Cristian F. Nino, Rebecca G. Hart, Warren E. Dixon
TL;DR
This work tackles the need for explicit, computable bounds on the parameter-derivatives of deep neural networks to enable Lyapunov-based stability guarantees in real-time control. It develops rigorous polynomial bounds on the first and second partial derivatives of fully-connected DNNs with respect to the parameter vector $\theta$, accommodating $sigmoidal$ and $ReLU$-like activations and providing closed-form, computable expressions. The authors introduce structured bounds for layer outputs, Jacobians, and Hessians, including auxiliary quantities $\mathcal{Q}_j$, $\mathcal{R}_{w,q,j}$, and $\mathcal{T}_{w,j}$, and derive a bound on the mixed second derivatives and the overall Jacobian $\partial \Phi / \partial \theta$. They further bound the higher-order Taylor remainder $R(\sigma, \tilde{\theta})$ by a polynomial function $\rho_0(\Vert \sigma \Vert)$ times $\Vert \tilde{\theta} \Vert^2$, enabling precise convergence and stability analyses for gradient-based learning in safety-critical control systems.
Abstract
Deep neural networks (DNNs) have emerged as a powerful tool with a growing body of literature exploring Lyapunov-based approaches for real-time system identification and control. These methods depend on establishing bounds for the second partial derivatives of DNNs with respect to their parameters, a requirement often assumed but rarely addressed explicitly. This paper provides rigorous mathematical formulations of polynomial bounds on both the first and second partial derivatives of DNNs with respect to their parameters. We present lemmas that characterize these bounds for fully-connected DNNs, while accommodating various classes of activation function including sigmoidal and ReLU-like functions. Our analysis yields closed-form expressions that enable precise stability guarantees for Lyapunov-based deep neural networks (Lb-DNNs). Furthermore, we extend our results to bound the higher-order terms in first-order Taylor approximations of DNNs, providing important tools for convergence analysis in gradient-based learning algorithms. The developed theoretical framework develops explicit, computable expressions, for previously assumed bounds, thereby strengthening the mathematical foundation of neural network applications in safety-critical control systems.
