Table of Contents
Fetching ...

Mapping Networks

Lord Sen, Shyamapada Mukherjee

TL;DR

The Mapping Theorem enforced by a dedicated Mapping Loss, shows the existence of a mapping from this latent space to the target weight space both theoretically and in practice.

Abstract

The escalating parameter counts in modern deep learning models pose a fundamental challenge to efficient training and resolution of overfitting. We address this by introducing the \emph{Mapping Networks} which replace the high dimensional weight space by a compact, trainable latent vector based on the hypothesis that the trained parameters of large networks reside on smooth, low-dimensional manifolds. Henceforth, the Mapping Theorem enforced by a dedicated Mapping Loss, shows the existence of a mapping from this latent space to the target weight space both theoretically and in practice. Mapping Networks significantly reduce overfitting and achieve comparable to better performance than target network across complex vision and sequence tasks, including Image Classification, Deepfake Detection etc, with $\mathbf{99.5\%}$, i.e., around $500\times$ reduction in trainable parameters.

Mapping Networks

TL;DR

The Mapping Theorem enforced by a dedicated Mapping Loss, shows the existence of a mapping from this latent space to the target weight space both theoretically and in practice.

Abstract

The escalating parameter counts in modern deep learning models pose a fundamental challenge to efficient training and resolution of overfitting. We address this by introducing the \emph{Mapping Networks} which replace the high dimensional weight space by a compact, trainable latent vector based on the hypothesis that the trained parameters of large networks reside on smooth, low-dimensional manifolds. Henceforth, the Mapping Theorem enforced by a dedicated Mapping Loss, shows the existence of a mapping from this latent space to the target weight space both theoretically and in practice. Mapping Networks significantly reduce overfitting and achieve comparable to better performance than target network across complex vision and sequence tasks, including Image Classification, Deepfake Detection etc, with , i.e., around reduction in trainable parameters.
Paper Structure (26 sections, 30 equations, 5 figures, 8 tables)

This paper contains 26 sections, 30 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: State of the Existing Works and Ours in this field
  • Figure 2: Parameter update snapshots showing distinct parameter manifolds in CNN evolution.
  • Figure 3: General Architecture for Mapping Networks.
  • Figure 4: Process of modulation of Mapping weights and training of latent vector z from epoch p to p+1.
  • Figure 5: Training strategies used for Mapping Network