Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition
Koray Kavukcuoglu, Marc'Aurelio Ranzato, Yann LeCun
TL;DR
The paper tackles the computational bottleneck of sparse coding in vision tasks by introducing Predictive Sparse Decomposition (PSD), which jointly learns a basis and a nonlinear feed-forward regressor to predict sparse codes. The method optimizes a loss that enforces accurate reconstruction, sparsity, and predictability, yielding a fast, smooth approximate inference that matches or surpasses exact sparse coding in recognition accuracy. Experiments on MNIST and Caltech-style tasks show substantial speedups (over 100×) with competitive or superior performance, highlighting PSD's potential for real-time object recognition. The work also explores stability under natural input changes and discusses future directions toward convolutional and hierarchical deep architectures.
Abstract
Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks.
