Table of Contents
Fetching ...

Remarks on Lipschitz-Minimal Interpolation: Generalization Bounds and Neural Network Implementation

Arthur C. B. de Oliveira, Ruigang Wang, Ian R. Manchester, Eduardo D. Sontag

Abstract

This note establishes a theoretical framework for finding (potentially overparameterized) approximations of a function on a compact set with a-priori bounds for the generalization error. The approximation method considered is to choose, among all functions that (approximately) interpolate a given data set, one with a minimal Lipschitz constant. The paper establishes rigorous generalization bounds over practically relevant classes of approximators, including deep neural networks. It also presents a neural network implementation based on Lipschitz-bounded network layers and an augmented Lagrangian method. The results are illustrated for a problem of learning the dynamics of an input-to-state stable system with certified bounds on simulation error.

Remarks on Lipschitz-Minimal Interpolation: Generalization Bounds and Neural Network Implementation

Abstract

This note establishes a theoretical framework for finding (potentially overparameterized) approximations of a function on a compact set with a-priori bounds for the generalization error. The approximation method considered is to choose, among all functions that (approximately) interpolate a given data set, one with a minimal Lipschitz constant. The paper establishes rigorous generalization bounds over practically relevant classes of approximators, including deep neural networks. It also presents a neural network implementation based on Lipschitz-bounded network layers and an augmented Lagrangian method. The results are illustrated for a problem of learning the dynamics of an input-to-state stable system with certified bounds on simulation error.
Paper Structure (13 sections, 7 theorems, 61 equations, 3 figures, 3 tables)

This paper contains 13 sections, 7 theorems, 61 equations, 3 figures, 3 tables.

Key Result

Lemma 1

Given $g\in L(\mathcal{D})$, let $\mathbb{D}_{N}^{\overline\varepsilon}$ be a noisy dataset with noise bound $\overline\varepsilon$. Furthermore, for any $\varepsilon>0$ let $f^*$ be any function in $L(\mathcal{D})$ that satisfies $\ell(\mathbb{D}_{N}^{\overline\varepsilon},f^*)\leq \varepsilon$, wi

Figures (3)

  • Figure 1: Trajectory errors of different models over 500 test data samples.
  • Figure 2: A trajectory sample of learnt models.
  • Figure 3: The output channel $f_i$ of the learnt models with fixed $x_2$ over the region $(x_1, x_3) \in [-2,2]\times[-2,2]$.

Theorems & Definitions (16)

  • Lemma 1
  • proof
  • Definition 1: Fill Distance/Covering Radius reznikov2016covering
  • Corollary 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Corollary 2
  • ...and 6 more