Table of Contents
Fetching ...

Training Better Deep Learning Models Using Human Saliency

Aidan Boyd, Patrick Tinsley, Kevin W. Bowyer, Adam Czajka

TL;DR

CYBORG training of CNNs addresses important issues such as reducing the appetite for large training sets, increasing interpretability, and reducing fragility by generalizing better to new types of data.

Abstract

This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training. Traditionally, training of DCNNs is purely data-driven. This often results in learning features of the data that are only coincidentally correlated with class labels. Human saliency can guide network training using our proposed new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions. This mechanism produces DCNNs achieving higher accuracy and generalization compared to using the same training data without human salience. Experimental results demonstrate that CYBORG applies across multiple network architectures and problem domains (detection of synthetic faces, iris presentation attacks and anomalies in chest X-rays), while requiring significantly less data than training without human saliency guidance. Visualizations show that CYBORG-trained models' saliency is more consistent across independent training runs than traditionally-trained models, and also in better agreement with humans. To lower the cost of collecting human annotations, we also explore using deep learning to provide automated annotations. CYBORG training of CNNs addresses important issues such as reducing the appetite for large training sets, increasing interpretability, and reducing fragility by generalizing better to new types of data.

Training Better Deep Learning Models Using Human Saliency

TL;DR

CYBORG training of CNNs addresses important issues such as reducing the appetite for large training sets, increasing interpretability, and reducing fragility by generalizing better to new types of data.

Abstract

This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training. Traditionally, training of DCNNs is purely data-driven. This often results in learning features of the data that are only coincidentally correlated with class labels. Human saliency can guide network training using our proposed new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions. This mechanism produces DCNNs achieving higher accuracy and generalization compared to using the same training data without human salience. Experimental results demonstrate that CYBORG applies across multiple network architectures and problem domains (detection of synthetic faces, iris presentation attacks and anomalies in chest X-rays), while requiring significantly less data than training without human saliency guidance. Visualizations show that CYBORG-trained models' saliency is more consistent across independent training runs than traditionally-trained models, and also in better agreement with humans. To lower the cost of collecting human annotations, we also explore using deep learning to provide automated annotations. CYBORG training of CNNs addresses important issues such as reducing the appetite for large training sets, increasing interpretability, and reducing fragility by generalizing better to new types of data.

Paper Structure

This paper contains 35 sections, 2 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Our proposed training strategy to Conve Ys Brain Oversight to Raise Generalization. CYBORG guides the network throughout training to learn features using image regions judged as salient for human visual perception. This results in a model that is more likely to learn features from regions that are salient to humans, and less likely to learn features that are accidentally correlated with class labels. A boost in generalization performance is demonstrated.
  • Figure 2: Explanation of parameter sets used in this work.
  • Figure 3: Example images from each data source for the task of synthetic face detection.
  • Figure 4: Example images from each data source for the task of iris presentation attack detection.
  • Figure 5: Examples of both healthy and abnormal chest x-rays for the task of abnormality detection from chest x-ray.
  • ...and 10 more figures