Table of Contents
Fetching ...

Deep Learning for Robust and Explainable Models in Computer Vision

Mohammadreza Amirian

TL;DR

A key message of this thesis is introducing model interpretation techniques as a tool for understanding vision models and improving their design and robustness despite interpretability and robustness being seemingly unrelated in the related research.

Abstract

Recent breakthroughs in machine and deep learning (ML and DL) research have provided excellent tools for leveraging enormous amounts of data and optimizing huge models with millions of parameters to obtain accurate networks for image processing. These developments open up tremendous opportunities for using artificial intelligence (AI) in the automation and human assisted AI industry. However, as more and more models are deployed and used in practice, many challenges have emerged. This thesis presents various approaches that address robustness and explainability challenges for using ML and DL in practice. Robustness and reliability are the critical components of any model before certification and deployment in practice. Deep convolutional neural networks (CNNs) exhibit vulnerability to transformations of their inputs, such as rotation and scaling, or intentional manipulations as described in the adversarial attack literature. In addition, building trust in AI-based models requires a better understanding of current models and developing methods that are more explainable and interpretable a priori. This thesis presents developments in computer vision models' robustness and explainability. Furthermore, this thesis offers an example of using vision models' feature response visualization (models' interpretations) to improve robustness despite interpretability and robustness being seemingly unrelated in the related research. Besides methodological developments for robust and explainable vision models, a key message of this thesis is introducing model interpretation techniques as a tool for understanding vision models and improving their design and robustness. In addition to the theoretical developments, this thesis demonstrates several applications of ML and DL in different contexts, such as medical imaging and affective computing.

Deep Learning for Robust and Explainable Models in Computer Vision

TL;DR

A key message of this thesis is introducing model interpretation techniques as a tool for understanding vision models and improving their design and robustness despite interpretability and robustness being seemingly unrelated in the related research.

Abstract

Recent breakthroughs in machine and deep learning (ML and DL) research have provided excellent tools for leveraging enormous amounts of data and optimizing huge models with millions of parameters to obtain accurate networks for image processing. These developments open up tremendous opportunities for using artificial intelligence (AI) in the automation and human assisted AI industry. However, as more and more models are deployed and used in practice, many challenges have emerged. This thesis presents various approaches that address robustness and explainability challenges for using ML and DL in practice. Robustness and reliability are the critical components of any model before certification and deployment in practice. Deep convolutional neural networks (CNNs) exhibit vulnerability to transformations of their inputs, such as rotation and scaling, or intentional manipulations as described in the adversarial attack literature. In addition, building trust in AI-based models requires a better understanding of current models and developing methods that are more explainable and interpretable a priori. This thesis presents developments in computer vision models' robustness and explainability. Furthermore, this thesis offers an example of using vision models' feature response visualization (models' interpretations) to improve robustness despite interpretability and robustness being seemingly unrelated in the related research. Besides methodological developments for robust and explainable vision models, a key message of this thesis is introducing model interpretation techniques as a tool for understanding vision models and improving their design and robustness. In addition to the theoretical developments, this thesis demonstrates several applications of ML and DL in different contexts, such as medical imaging and affective computing.
Paper Structure (94 sections, 35 equations, 37 figures, 12 tables)

This paper contains 94 sections, 35 equations, 37 figures, 12 tables.

Figures (37)

  • Figure 1: The cross-correlation function is often implemented in deep learning libraries for convolutional neural networks. For an input image, the output (kernel response) is the dot product of the vectorized kernel with a field which is the same size as the input image. The kernel slides over the entire image area with a given step size (figure adopted from lecun2015deep).
  • Figure 2: Max-pooling (MP) and average-pooling (AP) layers with kernel size and stride of $2$ for CNNs (figure adopted from wang2018multiple).
  • Figure 3: A convolutional neural network for representation learning from an input image, followed by a feedforward network for object classification.
  • Figure 4: The residual connection between a layer's input and output improves the gradient flow (figure is adopted from he2016deep).
  • Figure 5: This figure, adopted from szegedy2015going, depicts the idea of the inception blocks and their practical implementation.
  • ...and 32 more figures