Dual Convexified Convolutional Neural Networks

Site Bai; Chuyang Ke; Jean Honorio

Dual Convexified Convolutional Neural Networks

Site Bai, Chuyang Ke, Jean Honorio

TL;DR

A highly novel weight recovery algorithm is proposed, which takes the dual solution and the kernel information as the input, and recovers the linear weight and the output of convolutional layer, instead of weight parameter.

Abstract

We propose the framework of dual convexified convolutional neural networks (DCCNNs). In this framework, we first introduce a primal learning problem motivated by convexified convolutional neural networks (CCNNs), and then construct the dual convex training program through careful analysis of the Karush-Kuhn-Tucker (KKT) conditions and Fenchel conjugates. Our approach reduces the computational overhead of constructing a large kernel matrix and more importantly, eliminates the ambiguity of factorizing the matrix. Due to the low-rank structure in CCNNs and the related subdifferential of nuclear norms, there is no closed-form expression to recover the primal solution from the dual solution. To overcome this, we propose a highly novel weight recovery algorithm, which takes the dual solution and the kernel information as the input, and recovers the linear weight and the output of convolutional layer, instead of weight parameter. Furthermore, our recovery algorithm exploits the low-rank structure and imposes a small number of filters indirectly, which reduces the parameter size. As a result, DCCNNs inherit all the statistical benefits of CCNNs, while enjoying a more formal and efficient workflow.

Dual Convexified Convolutional Neural Networks

TL;DR

Abstract

Paper Structure (38 sections, 14 theorems, 84 equations, 1 figure, 3 tables, 3 algorithms)

This paper contains 38 sections, 14 theorems, 84 equations, 1 figure, 3 tables, 3 algorithms.

Introduction
Preliminaries
Notation
Mathematical Formulation of Convolutional Neural Networks
Convexified Convolutional Neural Networks
Main Results
Dual Optimization Problem
Recovering the Parameters
Recovering the Linear Weight
Recovering the convolutional weight
Extension to Multiclass Classification
Experiments
Conclusion
Detailed Proofs for Binary Classification
Proof of Theorem \ref{['dual_theorem']}
...and 23 more sections

Key Result

Theorem 1

The dual problem of Eq. primal is given by: in which $\alpha_i$'s are the dual variables, $\ell^\ast(\cdot)$To have a detailed illustration of $\ell^\ast(\cdot)$, we include the Fenchel conjugate of some common losses in Appendix app_conjugate. is the Fenchel conjugate of the loss function $\ell(\cdot)$, $K\left(\mathbf{x}_i, \mathbf{x}_j\rig

Figures (1)

Figure 1: (a): In the primal framework, basis function matrix $\Phi(x)$ is approximated by a matrix $Q$ from the factorization of kernel matrix such that $K = QQ^\top$. The convolutional weight $W$ and linear weight $L$ are multiplied together as matrix $A$ with low-rankness enforced by nuclear norm constraint. $W$ is recovered by a low-rank approximation from optimized $A$. (b): The dual framework uses $K(x,x_i)$ without ambiguous factorization, and recovers the weights with the optimized dual variable $\alpha$. The primal solution $A$ cannot be directly recovered because $A$ has no closed-form expression of $\alpha$. Therefore, the dual framework recovers linear weight $L$ and computes the convolution output $\Phi(x)^\top W$ directly without $W$ or $\Phi(x)$.

Theorems & Definitions (21)

Theorem 1
Lemma 1
Lemma 2: Subdifferential of Nuclear Norm watson1992characterization
Theorem 2
Remark 1
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Theorem \ref{dual_theorem}
...and 11 more

Dual Convexified Convolutional Neural Networks

TL;DR

Abstract

Dual Convexified Convolutional Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (21)