Table of Contents
Fetching ...

Some Improvements on Deep Convolutional Neural Network Based Image Classification

Andrew G. Howard

TL;DR

This paper summarizes the entry in the Imagenet Large Scale Visual Recognition Challenge 2013, which achieved a top 5 classification error rate and achieved over a 20% relative improvement on the previous year's winner.

Abstract

We investigate multiple techniques to improve upon the current state of the art deep convolutional neural network based image classification pipeline. The techiques include adding more image transformations to training data, adding more transformations to generate additional predictions at test time and using complementary models applied to higher resolution images. This paper summarizes our entry in the Imagenet Large Scale Visual Recognition Challenge 2013. Our system achieved a top 5 classification error rate of 13.55% using no external data which is over a 20% relative improvement on the previous year's winner.

Some Improvements on Deep Convolutional Neural Network Based Image Classification

TL;DR

This paper summarizes the entry in the Imagenet Large Scale Visual Recognition Challenge 2013, which achieved a top 5 classification error rate and achieved over a 20% relative improvement on the previous year's winner.

Abstract

We investigate multiple techniques to improve upon the current state of the art deep convolutional neural network based image classification pipeline. The techiques include adding more image transformations to training data, adding more transformations to generate additional predictions at test time and using complementary models applied to higher resolution images. This paper summarizes our entry in the Imagenet Large Scale Visual Recognition Challenge 2013. Our system achieved a top 5 classification error rate of 13.55% using no external data which is over a 20% relative improvement on the previous year's winner.

Paper Structure

This paper contains 12 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Even well centered images when cropped lose information like the cat's ear and tail compared to the full image on the right. We select training patches from the full image to avoid loss of information.
  • Figure 2: We generate predictions based on three different square views of the image to incorporate all of the pixels and to take into account differing image sizes.
  • Figure 3: This figure shows the accuracy of the greedy selection algorithm as it adds more predictions compared to the baseline 10 predictions and the full 90 predictions.