Having Second Thoughts? Let's hear it

Jung H. Lee; Sujith Vijayan

Having Second Thoughts? Let's hear it

Jung H. Lee, Sujith Vijayan

TL;DR

The paper tackles the robustness of deep learning vision models by introducing STCert, a two-stage, brain-inspired certification that leverages top-down information to improve reliability. It employs foundation segmentation models (Grounding DINO and SAM) to identify ROIs from an original prediction and generates second-thought classifications, comparing them to certify outputs. Across ImageNet subsets and multiple architectures, context-aware STCert reduces inter-category errors and can detect artificial and natural adversarial inputs, particularly when the original and second thoughts are produced by different classifiers. While STCert acts as an error detector (not a fix) and incurs computational costs, it demonstrates meaningful gains in safety-focused scenarios and reveals how context and high-level priors influence DL decision-making.

Abstract

Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.

Having Second Thoughts? Let's hear it

TL;DR

Abstract

Paper Structure (16 sections, 1 equation, 10 figures, 5 tables)

This paper contains 16 sections, 1 equation, 10 figures, 5 tables.

Introduction
Approach
Normal Dataset
Adversarial inputs
ImageNet models
Naive STCert
Context-aware STCert
Implications for the relationships between contexts and models' decision-making
Adversarial Inputs
Artificial adversarial examples
Natural adversarial examples
Two classifier-based STCert on normal inputs
Discussion
Limitations
Broader Impacts
...and 1 more sections

Figures (10)

Figure 1: Workflow of STCert. In the first stage, a DL model makes an original prediction, and segmentation models (Grounding DINO and SAM in this study) identify ROI (region of interest). ROIs are used as inputs to obtain second thought predictions. Finally, STCert compares the original and second thoughts predictions to determine whether the original predictions can be certified.
Figure 1: Comparison of certified predictions on 4 datasets. (A), the same as Fig. \ref{['fig:fig2']}A, but dataset is Mixed_13. (B), the same as Fig. \ref{['fig:fig2']}A, but dataset is Mixed_13. (C), the same as Fig. \ref{['fig:fig2']}A, but dataset is Big_12. (D), the same as Fig. \ref{['fig:fig2']}A, but dataset is Geirhos_16.
Figure 2: Certified Predictions of Mixed_10. (A), Certified predictions on the masks of objects. (B), Certified predictions on ROI (see the text). The top row shows the ratio of certified predictions depending on the model tested. The certified answers are split into correct (CertCorr shown in blue bars), intra-category error (IntraError, shown in orange bars) and inter-category error (InterError shown in green bars). Additionally, we report how often STCert fails to certify the originally correct prediction (miss shown in red bars), and it fails to detect bounding boxes regarding the original prediction (No Box shown in purple bars); foundation segmentation models sometimes cannot detect any boxes related to given prompts. The bottom row compares Inter- and Intra-Error between original and certified predictions. Blue and orange bars denote original Intra- and Inter-Error, respectively, whereas green and red denote certified Intra- and Inter-Error, respectively.
Figure 2: Examples of images that induce ResNet18 to make 'IntraError', in which the ground truth labels and certified predictions belong to different categories (i.e., superclass). Ground truth labels and certified labels are shown above each image. The classes before and after the arrow denote ground truth labels and certified predictions, respectively. (A)-(F), images that include both ground truths and certified predictions. (G), image, in which ground truth and certified prediction are semantically related to each other. (H) image that contains an incorrect ground truth label.
Figure 3: Evaluation of certified predictions depending on the context width (cw). (A), Schematics of our ROI selection. (B), correct certified predictions (CertCorr in blue), intra-category error (IntraError in orange) and inter-category error (InterError in green) observed on Mixed_10. 'mask' in the $x$-axis denotes the reference point that ROIs do not contain any additional background. (C)-(F), the same as (B), but the datasets are Mixed_13, Living_9, Big_12, Geirhos_16, respectively.
...and 5 more figures

Having Second Thoughts? Let's hear it

TL;DR

Abstract

Having Second Thoughts? Let's hear it

Authors

TL;DR

Abstract

Table of Contents

Figures (10)