Deep Fried Convnets
Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang
TL;DR
The paper tackles the heavy parameter burden of fully connected layers in CNNs by replacing them with an Adaptive Fastfood transform, yielding deep fried convnets that are end-to-end trainable. This approach achieves substantial memory savings with negligible or no loss in predictive performance on MNIST and ImageNet, and shows advantages over post-processing compression methods, particularly on large-scale data. The authors connect the transform to structured random projections and kernel feature approximations, providing both theoretical and empirical support for the method. Overall, Adaptive Fastfood enables memory-efficient CNNs suitable for deployment on GPUs and embedded devices, with potential further gains from optimized implementations and final-layer compression.
Abstract
The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters. Reducing the number of parameters while preserving essentially the same predictive performance is critically important for operating deep neural networks in memory constrained environments such as GPUs or embedded devices. In this paper we show how kernel methods, in particular a single Fastfood layer, can be used to replace all fully connected layers in a deep convolutional neural network. This novel Fastfood layer is also end-to-end trainable in conjunction with convolutional layers, allowing us to combine them into a new architecture, named deep fried convolutional networks, which substantially reduces the memory footprint of convolutional networks trained on MNIST and ImageNet with no drop in predictive performance.
