Table of Contents
Fetching ...

A Compact Deep Architecture for Real-time Saliency Prediction

Saman Zabihi, Hamed Rezazadegan Tavakoli, Ali Borji

TL;DR

A compact yet fast model for real-time saliency prediction using a modified U-net architecture, a novel fully connected layer, and central difference convolutional layers that facilitates the implicit capturing of the location-dependent information.

Abstract

Saliency computation models aim to imitate the attention mechanism in the human visual system. The application of deep neural networks for saliency prediction has led to a drastic improvement over the last few years. However, deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency prediction. Our proposed model consists of a modified U-net architecture, a novel fully connected layer, and central difference convolutional layers. The modified U-Net architecture promotes compactness and efficiency. The novel fully-connected layer facilitates the implicit capturing of the location-dependent information. Using the central difference convolutional layers at different scales enables capturing more robust and biologically motivated features. We compare our model with state of the art saliency models using traditional saliency scores as well as our newly devised scheme. Experimental results over four challenging saliency benchmark datasets demonstrate the effectiveness of our approach in striking a balance between accuracy and speed. Our model can be run in real-time which makes it appealing for edge devices and video processing.

A Compact Deep Architecture for Real-time Saliency Prediction

TL;DR

A compact yet fast model for real-time saliency prediction using a modified U-net architecture, a novel fully connected layer, and central difference convolutional layers that facilitates the implicit capturing of the location-dependent information.

Abstract

Saliency computation models aim to imitate the attention mechanism in the human visual system. The application of deep neural networks for saliency prediction has led to a drastic improvement over the last few years. However, deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency prediction. Our proposed model consists of a modified U-net architecture, a novel fully connected layer, and central difference convolutional layers. The modified U-Net architecture promotes compactness and efficiency. The novel fully-connected layer facilitates the implicit capturing of the location-dependent information. Using the central difference convolutional layers at different scales enables capturing more robust and biologically motivated features. We compare our model with state of the art saliency models using traditional saliency scores as well as our newly devised scheme. Experimental results over four challenging saliency benchmark datasets demonstrate the effectiveness of our approach in striking a balance between accuracy and speed. Our model can be run in real-time which makes it appealing for edge devices and video processing.

Paper Structure

This paper contains 23 sections, 5 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Our architecture for fast saliency computation. It is a asymmetric U-Net alike network that is composed of MobileNetV2 as backend, central difference Convolutional layers for extracting more consistent patterns and a fully-connected layer for extracting location-dependent features.
  • Figure 2: MobileNetV2 Convolutional Block RN33
  • Figure 3: Left) NSS vs FLOPs, Right) NSS vs the number of parameters
  • Figure 4: Left) CC vs FLOPs, Right) CC vs the number of parameters
  • Figure 5: Left) AUC vs FLOPs, Right) AUC vs the number of parameters
  • ...and 3 more figures