Table of Contents
Fetching ...

Deep neural networks are robust to weight binarization and other non-linear distortions

Paul Merolla, Rathinakumar Appuswamy, John Arthur, Steve K. Esser, Dharmendra Modha

TL;DR

The paper shows that deep networks trained with weight projections are robust not only to quantization but to a wide range of weight distortions, including non-linear ones. It introduces a flexible projection framework and a novel stochastic projection rule that yields strong CIFAR-10 results (7.64% test error without augmentation) and competitive ImageNet performance. The authors argue that projecting weights during training guides the optimization toward robust neighborhoods, and they reveal that even with or without projections, networks can maintain a base level of robustness. These findings have practical implications for low-precision hardware and potentially neuromorphic computing, suggesting broader tolerance to weight distortions than previously recognized.

Abstract

Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is just the tip of the iceberg: these same networks, during testing, also exhibit a remarkable robustness to distortions beyond quantization, including additive and multiplicative noise, and a class of non-linear projections where binarization is just a special case. To quantify this robustness, we show that one such network achieves 11% test error on CIFAR-10 even with 0.68 effective bits per weight. Furthermore, we find that a common training heuristic--namely, projecting quantized weights during backpropagation--can be altered (or even removed) and networks still achieve a base level of robustness during testing. Specifically, training with weight projections other than quantization also works, as does simply clipping the weights, both of which have never been reported before. We confirm our results for CIFAR-10 and ImageNet datasets. Finally, drawing from these ideas, we propose a stochastic projection rule that leads to a new state of the art network with 7.64% test error on CIFAR-10 using no data augmentation.

Deep neural networks are robust to weight binarization and other non-linear distortions

TL;DR

The paper shows that deep networks trained with weight projections are robust not only to quantization but to a wide range of weight distortions, including non-linear ones. It introduces a flexible projection framework and a novel stochastic projection rule that yields strong CIFAR-10 results (7.64% test error without augmentation) and competitive ImageNet performance. The authors argue that projecting weights during training guides the optimization toward robust neighborhoods, and they reveal that even with or without projections, networks can maintain a base level of robustness. These findings have practical implications for low-precision hardware and potentially neuromorphic computing, suggesting broader tolerance to weight distortions than previously recognized.

Abstract

Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is just the tip of the iceberg: these same networks, during testing, also exhibit a remarkable robustness to distortions beyond quantization, including additive and multiplicative noise, and a class of non-linear projections where binarization is just a special case. To quantify this robustness, we show that one such network achieves 11% test error on CIFAR-10 even with 0.68 effective bits per weight. Furthermore, we find that a common training heuristic--namely, projecting quantized weights during backpropagation--can be altered (or even removed) and networks still achieve a base level of robustness during testing. Specifically, training with weight projections other than quantization also works, as does simply clipping the weights, both of which have never been reported before. We confirm our results for CIFAR-10 and ImageNet datasets. Finally, drawing from these ideas, we propose a stochastic projection rule that leads to a new state of the art network with 7.64% test error on CIFAR-10 using no data augmentation.

Paper Structure

This paper contains 12 sections, 6 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: (A) CIFAR-10 network trained using $\mathrm{Sign}$ projection (Tr-Sign-C) where each Conv is specified as $x$-$y$-$n_{in}$-$n_{out}$, and FC as $n_{in}$-$n_{out}$. There are two test scenarios, Te-None (top half with fp32), and Te-Sign (bottom half with binary). Four weight histograms post training are shown on the right. (B) Test error during training, evaluated every two epochs for Te-None and Te-Sign. Insets show weights for two corresponding filters post training. (C) Average absolute differences between $W_k$ and $\mathrm{Sign}(W_k)$ at each layer. (D) Correlation coefficient between neuron activity at each layer for a minibatch during two forward passes, one evaluated using $W_k$ and the other using $\mathrm{Sign}(W_k)$.
  • Figure 2: DNNs are robust to different types of weight distortions. Six networks were trained using different projections and clip values. After training, each network (including a control network) was tested with gaussian (A) and multiplicative (B) noise applied to the weights, and a distortion where each weight is raised to a power (C). Weight histograms for Conv2 of Tr-Sign-C are shown, where weights are projected using $\mathrm{Power}$ for four values of $\beta$ (D).
  • Figure 3: Two versions of AlexNet were trained without weight projections, Tr-None-NC which does not use weight clipping (A) and Tr-None-C which uses weight clipping (B). Test error was computed every $1$K iterations for projections $\mathrm{None}$, $\mathrm{Round}$, and $\mathrm{Sign}$. With weight clipping alone, the network becomes robust to weight quantizations.
  • Figure 4: (left) AlexNet trained via $\mathrm{StochM3}$ is robust to power distortions. (right) Test error of our AlexNet for different weight distortions (columns), compared to recent AlexNet models that use binary weights (rows).