Table of Contents
Fetching ...

Convergence for Discrete Parameter Update Schemes

Paul Wilson, Fabio Zanasi, George Constantinides

TL;DR

This work proposes a fully discrete update framework for training, eschewing quantised continuous updates in favor of integer-valued update rules. It establishes a general convergence theorem under mild Lipschitz-gradient assumptions and discrete-update bounds, showing that fixed learning-rate discrete updates converge with a quantified limit on the average gradient norm. The authors instantiate the framework with a zero-inflated multinomial update, deriving explicit moment- and convergence-results that scale favorably with dimension compared to prior fully discrete methods like BOLD. Empirical evaluation on MNIST demonstrates convergence of discrete updates in CNNs and ResNets with a small accuracy penalty, illustrating the practicality of discrete learning for potentially memory-efficient training. The work opens avenues for robust, fully discrete learning systems and applications to inherently discrete architectures.

Abstract

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.

Convergence for Discrete Parameter Update Schemes

TL;DR

This work proposes a fully discrete update framework for training, eschewing quantised continuous updates in favor of integer-valued update rules. It establishes a general convergence theorem under mild Lipschitz-gradient assumptions and discrete-update bounds, showing that fixed learning-rate discrete updates converge with a quantified limit on the average gradient norm. The authors instantiate the framework with a zero-inflated multinomial update, deriving explicit moment- and convergence-results that scale favorably with dimension compared to prior fully discrete methods like BOLD. Empirical evaluation on MNIST demonstrates convergence of discrete updates in CNNs and ResNets with a small accuracy penalty, illustrating the practicality of discrete learning for potentially memory-efficient training. The work opens avenues for robust, fully discrete learning systems and applications to inherently discrete architectures.

Abstract

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.

Paper Structure

This paper contains 6 sections, 7 theorems, 28 equations, 1 figure.

Key Result

Proposition 4

Let $F$ be a (possibly non-convex) function, and fix $\alpha_k = \bar{\alpha}$ for all $k \in \mathbb{N}$ where Then we have

Figures (1)

  • Figure 1: MNIST results: both convolutional and ResNet models converge over 10 epochs under our discrete update (ZIM), compared to SGD. Each curve is averaged over 10 runs; shaded regions show $\pm 1$ std.

Theorems & Definitions (10)

  • Definition 1: Discrete Stochastic Gradient Update
  • Proposition 4
  • Definition 5: Zero-inflated multinomial
  • Definition 6: ZIM update
  • Proposition 7
  • Proposition 8
  • Proposition 9
  • Proposition 10: First and Second Moments of $\mathsf{ZIMultinomial}$
  • Proposition 11: First and Second Moments of ZIM update
  • Proposition 12