Table of Contents
Fetching ...

Gibbs randomness-compression proposition: An efficient deep learning

M. Süzen

TL;DR

The so-called Gibbs randomness-compression proposition is formulated, signifying randomness-compression relationship via Gibbs entropy, and is supported with the experimental evidence, resulting in very high correlation between learning performance vs. Gibbs entropy over compression ratios.

Abstract

A proposition that connects randomness and compression is put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {\it directed randomness} that preserves information content. The proposition originated from the observed behavior in newly proposed {\it Dual Tomographic Compression} (DTC) compress-train framework. This is akin to tomographic reconstruction of layer weight matrices via building compressed sensed projections, via so-called {\it weight rays}. This tomographic approach is applied to previous and next layers in a dual fashion, that triggers neuronal-level pruning. This novel model compress-train scheme appears in iterative fashion and acts as a smart neural architecture search: also called {\it compression aware training}. The experiments demonstrated the utility of this dual-tomography during training: method accelerates and supports lottery ticket hypothesis. However, random compress-train iterations having similar performance demonstrated the connection between randomness and compression from statistical physics perspective, we formulated the so-called {\it Gibbs randomness-compression proposition}, signifying randomness-compression relationship via Gibbs entropy. The proposition is supported with the experimental evidence, resulting in very high correlation between learning performance vs. Gibbs entropy over compression ratios. Practically, the DTC framework provides a promising approach for massively energy- and resource-efficient deep learning training.

Gibbs randomness-compression proposition: An efficient deep learning

TL;DR

The so-called Gibbs randomness-compression proposition is formulated, signifying randomness-compression relationship via Gibbs entropy, and is supported with the experimental evidence, resulting in very high correlation between learning performance vs. Gibbs entropy over compression ratios.

Abstract

A proposition that connects randomness and compression is put forward via Gibbs entropy over set of measurement vectors associated with a compression process. The proposition states that a lossy compression process is equivalent to {\it directed randomness} that preserves information content. The proposition originated from the observed behavior in newly proposed {\it Dual Tomographic Compression} (DTC) compress-train framework. This is akin to tomographic reconstruction of layer weight matrices via building compressed sensed projections, via so-called {\it weight rays}. This tomographic approach is applied to previous and next layers in a dual fashion, that triggers neuronal-level pruning. This novel model compress-train scheme appears in iterative fashion and acts as a smart neural architecture search: also called {\it compression aware training}. The experiments demonstrated the utility of this dual-tomography during training: method accelerates and supports lottery ticket hypothesis. However, random compress-train iterations having similar performance demonstrated the connection between randomness and compression from statistical physics perspective, we formulated the so-called {\it Gibbs randomness-compression proposition}, signifying randomness-compression relationship via Gibbs entropy. The proposition is supported with the experimental evidence, resulting in very high correlation between learning performance vs. Gibbs entropy over compression ratios. Practically, the DTC framework provides a promising approach for massively energy- and resource-efficient deep learning training.

Paper Structure

This paper contains 14 sections, 1 theorem, 3 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 5.1

Given sequential lossy compression process consists of $M$ consecutive compression cycle. At each compression cycle reduces the data size $s_i$ percent, representing model size, with relative to uncompressed size, $i=1,..,M$, where $s_{0}=1.0$. At each compression cycle, a measure vector ${\bf y}_{

Figures (5)

  • Figure 1: DTC accuracy over different sparsity levels in train-compress iterative pruning.
  • Figure 2: DTC accuracy over different sparsity levels in train-compress iterative pruning. Inset shows a magnified view of higher sparsity levels.
  • Figure 3: Weight-rays, total of reconstructed weight over different sparsity levels.
  • Figure 4: Previous layer Gibbs entropy of computed measurements within DTC over compression process.
  • Figure 5: Next layer Gibbs entropy of computed measurements within DTC over compression process.

Theorems & Definitions (2)

  • Theorem 5.1: Gibbs-randomness proposition
  • proof