DeepGaze II: Reading fixations from deep features trained on object recognition
Matthias Kümmerer, Thomas S. A. Wallis, Matthias Bethge
TL;DR
The paper addresses predicting human fixations in free-viewing images and introduces DeepGaze II, which leverages fixed VGG-19 features as a general representation with a compact readout network to produce a saliency density p(x,y|I) within a probabilistic, log-likelihood framework. It trains via SALICON pretraining followed by image-wise cross-validated fine-tuning on MIT1003, and evaluates on MIT300 using an ensemble of ten models, achieving 87% of the explainable information gain and top MIT benchmark AUC metrics. The key contributions are demonstrating strong transfer learning from object-recognition features to saliency, showing that not retraining the feature extractor can yield robust performance, and providing both qualitative and quantitative analyses of where the model succeeds and where it falters. The work highlights the practical impact of using deep features for related visual tasks and offers a public web service for generating predictions.
Abstract
Here we present DeepGaze II, a model that predicts where people look in images. The model uses the features from the VGG-19 deep neural network trained to identify objects in images. Contrary to other saliency models that use deep features, here we use the VGG features for saliency prediction with no additional fine-tuning (rather, a few readout layers are trained on top of the VGG features to predict saliency). The model is therefore a strong test of transfer learning. After conservative cross-validation, DeepGaze II explains about 87% of the explainable information gain in the patterns of fixations and achieves top performance in area under the curve metrics on the MIT300 hold-out benchmark. These results corroborate the finding from DeepGaze I (which explained 56% of the explainable information gain), that deep features trained on object recognition provide a versatile feature space for performing related visual tasks. We explore the factors that contribute to this success and present several informative image examples. A web service is available to compute model predictions at http://deepgaze.bethgelab.org.
