Table of Contents
Fetching ...

Lipschitz Constant Meets Condition Number: Learning Robust and Compact Deep Neural Networks

Yangqi Feng, Shing-Ho J. Lin, Baoyuan Gao, Xian Wei

TL;DR

Novel joint constraints to adjust the weight distribution of networks are developed, namely, the Transformed Sparse Constraint joint with Condition Number Constraint (TSCNC), which copes with smoothing distribution and differentiable constraint functions to reduce condition number and thus avoid the ill-conditionedness of weight matrices.

Abstract

Recent research has revealed that high compression of Deep Neural Networks (DNNs), e.g., massive pruning of the weight matrix of a DNN, leads to a severe drop in accuracy and susceptibility to adversarial attacks. Integration of network pruning into an adversarial training framework has been proposed to promote adversarial robustness. It has been observed that a highly pruned weight matrix tends to be ill-conditioned, i.e., increasing the condition number of the weight matrix. This phenomenon aggravates the vulnerability of a DNN to input noise. Although a highly pruned weight matrix is considered to be able to lower the upper bound of the local Lipschitz constant to tolerate large distortion, the ill-conditionedness of such a weight matrix results in a non-robust DNN model. To overcome this challenge, this work develops novel joint constraints to adjust the weight distribution of networks, namely, the Transformed Sparse Constraint joint with Condition Number Constraint (TSCNC), which copes with smoothing distribution and differentiable constraint functions to reduce condition number and thus avoid the ill-conditionedness of weight matrices. Furthermore, our theoretical analyses unveil the relevance between the condition number and the local Lipschitz constant of the weight matrix, namely, the sharply increasing condition number becomes the dominant factor that restricts the robustness of over-sparsified models. Extensive experiments are conducted on several public datasets, and the results show that the proposed constraints significantly improve the robustness of a DNN with high pruning rates.

Lipschitz Constant Meets Condition Number: Learning Robust and Compact Deep Neural Networks

TL;DR

Novel joint constraints to adjust the weight distribution of networks are developed, namely, the Transformed Sparse Constraint joint with Condition Number Constraint (TSCNC), which copes with smoothing distribution and differentiable constraint functions to reduce condition number and thus avoid the ill-conditionedness of weight matrices.

Abstract

Recent research has revealed that high compression of Deep Neural Networks (DNNs), e.g., massive pruning of the weight matrix of a DNN, leads to a severe drop in accuracy and susceptibility to adversarial attacks. Integration of network pruning into an adversarial training framework has been proposed to promote adversarial robustness. It has been observed that a highly pruned weight matrix tends to be ill-conditioned, i.e., increasing the condition number of the weight matrix. This phenomenon aggravates the vulnerability of a DNN to input noise. Although a highly pruned weight matrix is considered to be able to lower the upper bound of the local Lipschitz constant to tolerate large distortion, the ill-conditionedness of such a weight matrix results in a non-robust DNN model. To overcome this challenge, this work develops novel joint constraints to adjust the weight distribution of networks, namely, the Transformed Sparse Constraint joint with Condition Number Constraint (TSCNC), which copes with smoothing distribution and differentiable constraint functions to reduce condition number and thus avoid the ill-conditionedness of weight matrices. Furthermore, our theoretical analyses unveil the relevance between the condition number and the local Lipschitz constant of the weight matrix, namely, the sharply increasing condition number becomes the dominant factor that restricts the robustness of over-sparsified models. Extensive experiments are conducted on several public datasets, and the results show that the proposed constraints significantly improve the robustness of a DNN with high pruning rates.

Paper Structure

This paper contains 29 sections, 5 theorems, 21 equations, 6 figures, 6 tables, 1 algorithm.

Key Result

Proposition 3.1

Let $\hat{y}=\arg \max_{k \in [1:c]} g_{k}\left(\mathbf{x}\right)$, and $\frac{1}{p}+\frac{1}{q}=1$, then for any $\Delta_{\mathbf{x}} \in B_{p}\left( \mathbf{0}, r\right)$, $p \in \mathbb{R}^+$ and a set of Lipschitz continuous functions $\{g_k:\mathbb{R}^n \to \mathbb{R} \}$, with: it holds that $\hat{y}=\arg \max_{k \in [1:c]} g_{k}(\mathbf{x} + \Delta_{\mathbf{x}})$, which means the classific

Figures (6)

  • Figure 1: Comparison of condition number and robust accuracy of three methods at different pruning rates (90% v.s. 95%). The histogram 'Acc' represents the value of the robust accuracy of different methods at the same pruning rate, while the line graph 'Cond' represents the condition number of the model.
  • Figure 2: a) The ill-conditioned weight space in the full connection layer; b) The parameter distribution of the pruned model on $90\%$ sparsity, the value distribution; c) The parameter distribution of the pruned model on $95\%$ sparsity. See per-layer pruning ratio in Supplementary Materials.
  • Figure 3: The variation of the condition number in the process of adversarial training with VGG-16, where 'Standard' indicates the pretaining network.
  • Figure 4: The impact of different value of $\lambda$ on robust accuracy, the pruning ratio is set to $90\%$ and $95\%$, respectively.
  • Figure 5: The results at different sparsity under various attack methods, we select WRN-28-10 with PGD adversarial training on CIFAR-10 compared with HYDRA and MAD.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Proposition 3.1: weng2018evaluatingl
  • Theorem 3.2: Sparsity and Robustness of nonlinear DNN guo2018sparse
  • Definition 3.3: $\ell_2$-norm condition number
  • Theorem 3.4
  • proof : Proof (Sketch)
  • Proposition 3.5
  • Proposition 3.6