Table of Contents
Fetching ...

Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway Framework

Zheming Zuo, Joseph Smith, Jonathan Stonehouse, Boguslaw Obara

TL;DR

An automatic best-suit training solution searching framework, the Dual-Carriageway Framework (DCF), which is capable of figuring out the optimal training strategy with the capability of avoiding overfitting but also yields built-in quantitative and visual explanations derived from the actual input and weights of the trained model.

Abstract

In the realm of practical fine-grained visual classification applications rooted in deep learning, a common scenario involves training a model using a pre-existing dataset. Subsequently, a new dataset becomes available, prompting the desire to make a pivotal decision for achieving enhanced and leveraged inference performance on both sides: Should one opt to train datasets from scratch or fine-tune the model trained on the initial dataset using the newly released dataset? The existing literature reveals a lack of methods to systematically determine the optimal training strategy, necessitating explainability. To this end, we present an automatic best-suit training solution searching framework, the Dual-Carriageway Framework (DCF), to fill this gap. DCF benefits from the design of a dual-direction search (starting from the pre-existing or the newly released dataset) where five different training settings are enforced. In addition, DCF is not only capable of figuring out the optimal training strategy with the capability of avoiding overfitting but also yields built-in quantitative and visual explanations derived from the actual input and weights of the trained model. We validated DCF's effectiveness through experiments with three convolutional neural networks (ResNet18, ResNet34 and Inception-v3) on two temporally continued commercial product datasets. Results showed fine-tuning pathways outperformed training-from-scratch ones by up to 2.13% and 1.23% on the pre-existing and new datasets, respectively, in terms of mean accuracy. Furthermore, DCF identified reflection padding as the superior padding method, enhancing testing accuracy by 3.72% on average. This framework stands out for its potential to guide the development of robust and explainable AI solutions in fine-grained visual classification tasks.

Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway Framework

TL;DR

An automatic best-suit training solution searching framework, the Dual-Carriageway Framework (DCF), which is capable of figuring out the optimal training strategy with the capability of avoiding overfitting but also yields built-in quantitative and visual explanations derived from the actual input and weights of the trained model.

Abstract

In the realm of practical fine-grained visual classification applications rooted in deep learning, a common scenario involves training a model using a pre-existing dataset. Subsequently, a new dataset becomes available, prompting the desire to make a pivotal decision for achieving enhanced and leveraged inference performance on both sides: Should one opt to train datasets from scratch or fine-tune the model trained on the initial dataset using the newly released dataset? The existing literature reveals a lack of methods to systematically determine the optimal training strategy, necessitating explainability. To this end, we present an automatic best-suit training solution searching framework, the Dual-Carriageway Framework (DCF), to fill this gap. DCF benefits from the design of a dual-direction search (starting from the pre-existing or the newly released dataset) where five different training settings are enforced. In addition, DCF is not only capable of figuring out the optimal training strategy with the capability of avoiding overfitting but also yields built-in quantitative and visual explanations derived from the actual input and weights of the trained model. We validated DCF's effectiveness through experiments with three convolutional neural networks (ResNet18, ResNet34 and Inception-v3) on two temporally continued commercial product datasets. Results showed fine-tuning pathways outperformed training-from-scratch ones by up to 2.13% and 1.23% on the pre-existing and new datasets, respectively, in terms of mean accuracy. Furthermore, DCF identified reflection padding as the superior padding method, enhancing testing accuracy by 3.72% on average. This framework stands out for its potential to guide the development of robust and explainable AI solutions in fine-grained visual classification tasks.
Paper Structure (14 sections, 6 equations, 10 figures, 8 tables)

This paper contains 14 sections, 6 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: A robust and explainable learning framework should be revealed by not only the leveraged FGVC performance over two temporally continued datasets with the existence of subtle pattern differences, imbalanced data samples and high sparsity within the region of interest but also the associated quantitative (via the actual model input, i.e. frequency distribution of the padded image) and visual explanations (through the learned model weights, layer-wise attention) in a putting-through manner.
  • Figure 1: Prediction performance (in %) comparisons on the testing sets of datasets $\mathcal{A}$ (i.e.$\widetilde{\mathcal{A}}$) and $\mathcal{B}$ (i.e.$\widetilde{\mathcal{B}}$) using the baseline models (ResNet34) trained on the training set of dataset $\mathcal{A}$ (i.e.$\widehat{\mathcal{A}}$) that was processed by various padding schemes. Results yielded by the optimal padding scheme are marked in bold and highlighted with (detailed in Table \ref{['tab:reflec_pad_details']}).
  • Figure 2: Sample images of the bottom side of commercial products produced by two manufacturers, 'F1' and 'F2', in pre-existing ($\mathcal{A}$) and newly released ($\mathcal{B}$) datasets. Target regions (dotted parts to be segmented by U-Net ronneberger2015u in Sec. \ref{['sec:classify']}) with ever-evolving anti-counterfeit code patterns embody spatial variations led by varying illumination, camera bias 9534097, and noisy textures jia2019coarse (e.g. dust and stains), affecting the overall FGVC accuracy.
  • Figure 3: Workflow of the proposed DCF for robust and explainable fine-grained visual classification. This figure is separated into two consecutive components and a shared part by colours corresponding to 1) the Padding Scheme Adaptor (Sec. \ref{['sec:pad_schemes']}), 2) the Training Pathway Selector (Sec. \ref{['sec:model_train_settings']}), and 3) testing sets with pattern variations derived from two chronologically continued datasets (Sec. \ref{['sec:ds']}).
  • Figure 4: Candidate padding schemes available in the PSA component of the proposed DCF presented in Fig. \ref{['fig:note']}.
  • ...and 5 more figures