Table of Contents
Fetching ...

Dual Convexified Convolutional Neural Networks

Site Bai, Chuyang Ke, Jean Honorio

TL;DR

A highly novel weight recovery algorithm is proposed, which takes the dual solution and the kernel information as the input, and recovers the linear weight and the output of convolutional layer, instead of weight parameter.

Abstract

We propose the framework of dual convexified convolutional neural networks (DCCNNs). In this framework, we first introduce a primal learning problem motivated by convexified convolutional neural networks (CCNNs), and then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates. Our approach reduces the computational overhead of constructing a large kernel matrix and more importantly, eliminates the ambiguity of factorizing the matrix. Due to the low-rank structure in CCNNs and the related subdifferential of nuclear norms, there is no closed-form expression to recover the primal solution from the dual solution. To overcome this, we propose a highly novel weight recovery algorithm, which takes the dual solution and the kernel information as the input, and recovers the linear weight and the output of convolutional layer, instead of weight parameter. Furthermore, our recovery algorithm exploits the low-rank structure and imposes a small number of filters indirectly, which reduces the parameter size. As a result, DCCNNs inherit all the statistical benefits of CCNNs, while enjoying a more formal and efficient workflow.

Dual Convexified Convolutional Neural Networks

TL;DR

A highly novel weight recovery algorithm is proposed, which takes the dual solution and the kernel information as the input, and recovers the linear weight and the output of convolutional layer, instead of weight parameter.

Abstract

We propose the framework of dual convexified convolutional neural networks (DCCNNs). In this framework, we first introduce a primal learning problem motivated by convexified convolutional neural networks (CCNNs), and then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates. Our approach reduces the computational overhead of constructing a large kernel matrix and more importantly, eliminates the ambiguity of factorizing the matrix. Due to the low-rank structure in CCNNs and the related subdifferential of nuclear norms, there is no closed-form expression to recover the primal solution from the dual solution. To overcome this, we propose a highly novel weight recovery algorithm, which takes the dual solution and the kernel information as the input, and recovers the linear weight and the output of convolutional layer, instead of weight parameter. Furthermore, our recovery algorithm exploits the low-rank structure and imposes a small number of filters indirectly, which reduces the parameter size. As a result, DCCNNs inherit all the statistical benefits of CCNNs, while enjoying a more formal and efficient workflow.
Paper Structure (38 sections, 14 theorems, 84 equations, 1 figure, 3 tables, 3 algorithms)

This paper contains 38 sections, 14 theorems, 84 equations, 1 figure, 3 tables, 3 algorithms.

Key Result

Theorem 1

The dual problem of Eq. primal is given by: in which $\alpha_i$'s are the dual variables, $\ell^\ast(\cdot)$To have a detailed illustration of $\ell^\ast(\cdot)$, we include the Fenchel conjugate of some common losses in Appendix app_conjugate. is the Fenchel conjugate of the loss function $\ell(\cdot)$, $K\left(\mathbf{x}_i, \mathbf{x}_j\rig

Figures (1)

  • Figure 1: (a): In the primal framework, basis function matrix $\Phi(x)$ is approximated by a matrix $Q$ from the factorization of kernel matrix such that $K = QQ^\top$. The convolutional weight $W$ and linear weight $L$ are multiplied together as matrix $A$ with low-rankness enforced by nuclear norm constraint. $W$ is recovered by a low-rank approximation from optimized $A$. (b): The dual framework uses $K(x,x_i)$ without ambiguous factorization, and recovers the weights with the optimized dual variable $\alpha$. The primal solution $A$ cannot be directly recovered because $A$ has no closed-form expression of $\alpha$. Therefore, the dual framework recovers linear weight $L$ and computes the convolution output $\Phi(x)^\top W$ directly without $W$ or $\Phi(x)$.

Theorems & Definitions (21)

  • Theorem 1
  • Lemma 1
  • Lemma 2: Subdifferential of Nuclear Norm watson1992characterization
  • Theorem 2
  • Remark 1
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem \ref{dual_theorem}
  • ...and 11 more