Table of Contents
Fetching ...

Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition

Koray Kavukcuoglu, Marc'Aurelio Ranzato, Yann LeCun

TL;DR

The paper tackles the computational bottleneck of sparse coding in vision tasks by introducing Predictive Sparse Decomposition (PSD), which jointly learns a basis and a nonlinear feed-forward regressor to predict sparse codes. The method optimizes a loss that enforces accurate reconstruction, sparsity, and predictability, yielding a fast, smooth approximate inference that matches or surpasses exact sparse coding in recognition accuracy. Experiments on MNIST and Caltech-style tasks show substantial speedups (over 100×) with competitive or superior performance, highlighting PSD's potential for real-time object recognition. The work also explores stability under natural input changes and discusses future directions toward convolutional and hierarchical deep architectures.

Abstract

Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks.

Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition

TL;DR

The paper tackles the computational bottleneck of sparse coding in vision tasks by introducing Predictive Sparse Decomposition (PSD), which jointly learns a basis and a nonlinear feed-forward regressor to predict sparse codes. The method optimizes a loss that enforces accurate reconstruction, sparsity, and predictability, yielding a fast, smooth approximate inference that matches or surpasses exact sparse coding in recognition accuracy. Experiments on MNIST and Caltech-style tasks show substantial speedups (over 100×) with competitive or superior performance, highlighting PSD's potential for real-time object recognition. The work also explores stability under natural input changes and discusses future directions toward convolutional and hierarchical deep architectures.

Abstract

Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks.

Paper Structure

This paper contains 10 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Classification error on MNIST as a function of reconstruction error using raw pixel values and, PCA, RBM, SESM and PSD features. Left-to-Right : 10-100-1000 samples per class are used for training a linear classifier on the features. The unsupervised algorithms were trained on the first 20,000 training samples of the MNIST dataset MNIST.
  • Figure 2: a) 256 basis functions of size 12x12 learned by PSD, trained on the Berkeley dataset. Each 12x12 block is a column of matrix $B$ in eq. (\ref{['eq:loss']}), i.e. a basis function. b) Object recognition architecture: linear adaptive filter bank, followed by $abs$ rectification, average down-sampling and linear SVM classifier.
  • Figure 3: a) Speed up for inferring the sparse representation achieved by PSD predictor over FS for a code with 64 units. The feed-forward extraction is more than 100 times faster. b) Recognition accuracy versus measured sparsity (average $\ell^1$ norm of the representation) of PSD predictor compared to the to the representation of FS algorithm. A difference within 1% is not statistically significant. c) Recognition accuracy as a function of number of basis functions.
  • Figure 4: Conditional probabilities for sign transitions between two consecutive frames. For instance, $P(-|+)$ shows the conditional probability of a unit being negative given that it was positive in the previous frame. The figure on the right is used as baseline, showing the conditional probabilities computed on pairs of random frames.