Table of Contents
Fetching ...

Degraded Polygons Raise Fundamental Questions of Neural Network Perception

Leonard Tang, Dan Ley

TL;DR

This work revisits the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision, and implements the Automated Shape Recoverability Test for rapidly generating large-scale datasets of perimeter-degraded regular polygons.

Abstract

It is well-known that modern computer vision systems often exhibit behaviors misaligned with those of humans: from adversarial attacks to image corruptions, deep learning vision models suffer in a variety of settings that humans capably handle. In light of these phenomena, here we introduce another, orthogonal perspective studying the human-machine vision gap. We revisit the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision. Specifically, we study the performance and behavior of neural networks on the seemingly simple task of classifying regular polygons at varying orders of degradation along their perimeters. To this end, we implement the Automated Shape Recoverability Test for rapidly generating large-scale datasets of perimeter-degraded regular polygons, modernizing the historically manual creation of image recoverability experiments. We then investigate the capacity of neural networks to recognize and recover such degraded shapes when initialized with different priors. Ultimately, we find that neural networks' behavior on this simple task conflicts with human behavior, raising a fundamental question of the robustness and learning capabilities of modern computer vision models.

Degraded Polygons Raise Fundamental Questions of Neural Network Perception

TL;DR

This work revisits the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision, and implements the Automated Shape Recoverability Test for rapidly generating large-scale datasets of perimeter-degraded regular polygons.

Abstract

It is well-known that modern computer vision systems often exhibit behaviors misaligned with those of humans: from adversarial attacks to image corruptions, deep learning vision models suffer in a variety of settings that humans capably handle. In light of these phenomena, here we introduce another, orthogonal perspective studying the human-machine vision gap. We revisit the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision. Specifically, we study the performance and behavior of neural networks on the seemingly simple task of classifying regular polygons at varying orders of degradation along their perimeters. To this end, we implement the Automated Shape Recoverability Test for rapidly generating large-scale datasets of perimeter-degraded regular polygons, modernizing the historically manual creation of image recoverability experiments. We then investigate the capacity of neural networks to recognize and recover such degraded shapes when initialized with different priors. Ultimately, we find that neural networks' behavior on this simple task conflicts with human behavior, raising a fundamental question of the robustness and learning capabilities of modern computer vision models.
Paper Structure (21 sections, 1 equation, 7 figures, 2 tables)

This paper contains 21 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Specific instance of the Automated Shape Recoverablility Test generation pipeline for an example pentagon with 50% degradation proportion. Whole shapes are generated and subsequently edited with corner degradation (top), and edge degradation (bottom). Our experiments indicate that, unlike time-constrained humans performing sketch recovery biederman, neural networks rely heavily on edges rather than corners to recover degraded shapes.
  • Figure 2: The five classes of nonaccidental properties (NAPs) for object recognition in the visual cortex are 1) collinearity, the presence of straight lines; 2) curvilinearity, the presence of smoothly curved elements; 3) symmetry across arbitrary axes; 4) parallel curves; and 5) vertices, junctions of two or more contours lowe1985perceptual. Critically, cognitive scientists suggest that NAPs form a perceptual basis for the set of components that enable object recognition.
  • Figure 3: Top-1 test accuracy (%) within $\pm 1$ SD (across 10 training trials) of ImageNet-pretrained and whole-polygon finetuned models on the shape recovery task. Accuracy decreases as degradation proportion, $p_d$, increases. Moreover, ResNet-15, ResNet-50, and MLP-Mixer all exhibit worse performance on edge-degraded compared to corner-degraded shapes, the opposite of human behavior.
  • Figure 4: Confusion matrices at 30%, 50%, and 70% removal proportions for corner-degraded shapes (top) and edge-degraded shapes (bottom) using ImageNet-pretrained ResNet-18, ResNet-50, MLP-Mixer, and ViT. As removal proportion increases, models default to predicting a single class.
  • Figure 5: Top-1 test accuracy (%) within $\pm 1$ SD (across 10 training trials) of FractalDB-pretrained and whole-polygon finetuned models on the shape recovery task. Again, accuracy decreases across the board as degradation proportion, $p_d$, increases. Compared to their ImageNet-pretrained counterparts, however, ResNet-18 and ResNet-50 both retain better performance on corner-degraded shapes. We also note the discrepancy compared to edge-degraded shapes, the opposite of human behavior.
  • ...and 2 more figures