Convexified Convolutional Neural Networks
Yuchen Zhang, Percy Liang, Martin J. Wainwright
TL;DR
This work introduces convexified convolutional neural networks (CCNNs) that preserve CNN-style parameter sharing while enabling convex optimization through a nuclear-norm low-rank relaxation and an RKHS-based nonlinear filter representation. For two-layer CCNNs, the authors prove a generalization bound showing the CCNN risk approaches the best possible two-layer CNN risk, with a favorable sample complexity due to sharing. They extend the approach to deeper networks via layer-wise training and demonstrate competitive performance on MNIST variants and CIFAR-10, often surpassing traditional CNNs and several nonconvolutional baselines. The results suggest convex relaxations can yield both scalable training and rigorous generalization guarantees for CNN-like architectures, while identifying open directions for formalizing deep CCNNs.
Abstract
We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a reproducing kernel Hilbert space, the CNN parameters can be represented as a low-rank matrix, which can be relaxed to obtain a convex optimization problem. For learning two-layer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. For learning deeper networks, we train CCNNs in a layer-wise manner. Empirically, CCNNs achieve performance competitive with CNNs trained by backpropagation, SVMs, fully-connected neural networks, stacked denoising auto-encoders, and other baseline methods.
