Table of Contents
Fetching ...

Learning Neural Network Classifiers with Low Model Complexity

Jayadeva, Himanshu Pant, Mayank Sharma, Abhimanyu Dubey, Sumit Soman, Suraj Tripathi, Sai Guruju, Nihal Goalla

TL;DR

A continuous and differentiable error functional for a neural network that minimizes its empirical error as well as a measure of the model complexity, which strongly suggest that deep learning techniques can benefit from model complexity control methods such as the LCNN learning rule.

Abstract

Modern neural network architectures for large-scale learning tasks have substantially higher model complexities, which makes understanding, visualizing and training these architectures difficult. Recent contributions to deep learning techniques have focused on architectural modifications to improve parameter efficiency and performance. In this paper, we derive a continuous and differentiable error functional for a neural network that minimizes its empirical error as well as a measure of the model complexity. The latter measure is obtained by deriving a differentiable upper bound on the Vapnik-Chervonenkis (VC) dimension of the classifier layer of a class of deep networks. Using standard backpropagation, we realize a training rule that tries to minimize the error on training samples, while improving generalization by keeping the model complexity low. We demonstrate the effectiveness of our formulation (the Low Complexity Neural Network - LCNN) across several deep learning algorithms, and a variety of large benchmark datasets. We show that hidden layer neurons in the resultant networks learn features that are crisp, and in the case of image datasets, quantitatively sharper. Our proposed approach yields benefits across a wide range of architectures, in comparison to and in conjunction with methods such as Dropout and Batch Normalization, and our results strongly suggest that deep learning techniques can benefit from model complexity control methods such as the LCNN learning rule.

Learning Neural Network Classifiers with Low Model Complexity

TL;DR

A continuous and differentiable error functional for a neural network that minimizes its empirical error as well as a measure of the model complexity, which strongly suggest that deep learning techniques can benefit from model complexity control methods such as the LCNN learning rule.

Abstract

Modern neural network architectures for large-scale learning tasks have substantially higher model complexities, which makes understanding, visualizing and training these architectures difficult. Recent contributions to deep learning techniques have focused on architectural modifications to improve parameter efficiency and performance. In this paper, we derive a continuous and differentiable error functional for a neural network that minimizes its empirical error as well as a measure of the model complexity. The latter measure is obtained by deriving a differentiable upper bound on the Vapnik-Chervonenkis (VC) dimension of the classifier layer of a class of deep networks. Using standard backpropagation, we realize a training rule that tries to minimize the error on training samples, while improving generalization by keeping the model complexity low. We demonstrate the effectiveness of our formulation (the Low Complexity Neural Network - LCNN) across several deep learning algorithms, and a variety of large benchmark datasets. We show that hidden layer neurons in the resultant networks learn features that are crisp, and in the case of image datasets, quantitatively sharper. Our proposed approach yields benefits across a wide range of architectures, in comparison to and in conjunction with methods such as Dropout and Batch Normalization, and our results strongly suggest that deep learning techniques can benefit from model complexity control methods such as the LCNN learning rule.

Paper Structure

This paper contains 18 sections, 3 theorems, 27 equations, 9 figures, 11 tables.

Key Result

Theorem 1

Consider a single layer neural network classifier with $n$ inputs and one bias term. The VC dimension $\gamma$ of this network is bounded from above by $(n + 1)$.

Figures (9)

  • Figure 1: Flowchart for experimental procedure.
  • Figure 2: Effect of dataset size on LCNN training time.
  • Figure 3: Effect of dataset size on LCNN accuracy.
  • Figure 4: Convergence of AlexNet on ImageNet for different values of hyperparameter $C$ in the LCNN objective. The performance rises initially during training and persists until convergence, indicating that the LCNN learns a good model early with few training samples.
  • Figure 5: t-SNE 2D visualizations of a few samples from test set of CIFAR10 and MNIST. We see that model complexity control consistently enforces crisper, more distinct clustering of classes in feature space.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof