Table of Contents
Fetching ...

Networks with Finite VC Dimension: Pro and Contra

Vera Kurkova, Marcello Sanguineti

TL;DR

This work analyzes neural networks with finite VC dimension under probabilistic task distributions, contrasting uniform convergence of empirical errors with function-approximation capabilities. It develops probabilistic bounds using the bounded-differences framework and McDiarmid inequalities, tying them to the growth function $\Pi_{\mathcal{H}}(m)$ of I/O function classes. A key result is that finite VC dimension yields strong concentration for empirical errors but can cause approximation errors to concentrate around large means when the distribution over functions is uniform; nonuniform priors can yield small mean approximation errors if a well-aligned function $h^*$ exists. The paper also discusses the impact of depth in ReLU networks on these trade-offs and offers guidance on when universal approximation is practically feasible under probabilistic task models.

Abstract

Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with its influence on consistency in learning from samples of data. It is shown that, whereas finite VC dimension is desirable for uniform convergence of empirical errors, it may not be desirable for approximation of functions drawn from a probability distribution modeling the likelihood that they occur in a given type of application. Based on the concentration-of-measure properties of high dimensional geometry, it is proven that both errors in approximation and empirical errors behave almost deterministically for networks implementing sets of input-output functions with finite VC dimensions in processing large data sets. Practical limitations of the universal approximation property, the trade-offs between the accuracy of approximation and consistency in learning from data, and the influence of depth of networks with ReLU units on their accuracy and consistency are discussed.

Networks with Finite VC Dimension: Pro and Contra

TL;DR

This work analyzes neural networks with finite VC dimension under probabilistic task distributions, contrasting uniform convergence of empirical errors with function-approximation capabilities. It develops probabilistic bounds using the bounded-differences framework and McDiarmid inequalities, tying them to the growth function of I/O function classes. A key result is that finite VC dimension yields strong concentration for empirical errors but can cause approximation errors to concentrate around large means when the distribution over functions is uniform; nonuniform priors can yield small mean approximation errors if a well-aligned function exists. The paper also discusses the impact of depth in ReLU networks on these trade-offs and offers guidance on when universal approximation is practically feasible under probabilistic task models.

Abstract

Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with its influence on consistency in learning from samples of data. It is shown that, whereas finite VC dimension is desirable for uniform convergence of empirical errors, it may not be desirable for approximation of functions drawn from a probability distribution modeling the likelihood that they occur in a given type of application. Based on the concentration-of-measure properties of high dimensional geometry, it is proven that both errors in approximation and empirical errors behave almost deterministically for networks implementing sets of input-output functions with finite VC dimensions in processing large data sets. Practical limitations of the universal approximation property, the trade-offs between the accuracy of approximation and consistency in learning from data, and the influence of depth of networks with ReLU units on their accuracy and consistency are discussed.

Paper Structure

This paper contains 6 sections, 4 theorems, 35 equations.

Key Result

Theorem 3.1

Let $A_1, \dots, A_m \subset {\mathbb R}^d$, $\phi: \prod_{i=1}^m A_i \to {\mathbb R}$ satisfies the bounded differences condition with the vector of parameters $c:= (c_1, \ldots, c_m)$, ${\mathcal{P}} = \prod_{i=1}^m {\mathcal{P}}_i$ a probability on $\prod_{i=1}^m A_i$, and $z_1, \ldots, z_m$ be

Theorems & Definitions (4)

  • Theorem 3.1: McDiarmid Bound
  • Proposition 3.2
  • Theorem 4.1
  • Theorem 4.2