Table of Contents
Fetching ...

Visual Categorization Across Minds and Models: Cognitive Analysis of Human Labeling and Neuro-Symbolic Integration

Chethana Prasad Kabgere

TL;DR

The paper investigates how humans and AI categorize ambiguous low-resolution visuals, contrasting symbolic, analogical, and embodied human strategies with feature-based CNN processing. Using a ResNet-18 baseline and Grad-CAM visualizations on CIFAR-10 stimuli, the study demonstrates robust human performance driven by shape-based prototypes and contextual grounding, while the AI relies on texture-driven features with limited interpretability. The findings highlight parallels and gaps across Marr’s levels, bounded rationality, and PDP-inspired representations, and argue for neuro-symbolic architectures that fuse structured reasoning with sub-symbolic perception for improved interpretability and robustness. The work advances understanding of cognitive alignment in AI, proposing concrete pathways for interpretable, context-aware systems that more closely mimic human visual reasoning and decision-making.

Abstract

Understanding how humans and AI systems interpret ambiguous visual stimuli offers critical insight into the nature of perception, reasoning, and decision-making. This paper examines image labeling performance across human participants and deep neural networks, focusing on low-resolution, perceptually degraded stimuli. Drawing from computational cognitive science, cognitive architectures, and connectionist-symbolic hybrid models, we contrast human strategies such as analogical reasoning, shape-based recognition, and confidence modulation with AI's feature-based processing. Grounded in Marr's tri-level hypothesis, Simon's bounded rationality, and Thagard's frameworks of representation and emotion, we analyze participant responses in relation to Grad-CAM visualizations of model attention. Human behavior is further interpreted through cognitive principles modeled in ACT-R and Soar, revealing layered and heuristic decision strategies under uncertainty. Our findings highlight key parallels and divergences between biological and artificial systems in representation, inference, and confidence calibration. The analysis motivates future neuro-symbolic architectures that unify structured symbolic reasoning with connectionist representations. Such architectures, informed by principles of embodiment, explainability, and cognitive alignment, offer a path toward AI systems that are not only performant but also interpretable and cognitively grounded.

Visual Categorization Across Minds and Models: Cognitive Analysis of Human Labeling and Neuro-Symbolic Integration

TL;DR

The paper investigates how humans and AI categorize ambiguous low-resolution visuals, contrasting symbolic, analogical, and embodied human strategies with feature-based CNN processing. Using a ResNet-18 baseline and Grad-CAM visualizations on CIFAR-10 stimuli, the study demonstrates robust human performance driven by shape-based prototypes and contextual grounding, while the AI relies on texture-driven features with limited interpretability. The findings highlight parallels and gaps across Marr’s levels, bounded rationality, and PDP-inspired representations, and argue for neuro-symbolic architectures that fuse structured reasoning with sub-symbolic perception for improved interpretability and robustness. The work advances understanding of cognitive alignment in AI, proposing concrete pathways for interpretable, context-aware systems that more closely mimic human visual reasoning and decision-making.

Abstract

Understanding how humans and AI systems interpret ambiguous visual stimuli offers critical insight into the nature of perception, reasoning, and decision-making. This paper examines image labeling performance across human participants and deep neural networks, focusing on low-resolution, perceptually degraded stimuli. Drawing from computational cognitive science, cognitive architectures, and connectionist-symbolic hybrid models, we contrast human strategies such as analogical reasoning, shape-based recognition, and confidence modulation with AI's feature-based processing. Grounded in Marr's tri-level hypothesis, Simon's bounded rationality, and Thagard's frameworks of representation and emotion, we analyze participant responses in relation to Grad-CAM visualizations of model attention. Human behavior is further interpreted through cognitive principles modeled in ACT-R and Soar, revealing layered and heuristic decision strategies under uncertainty. Our findings highlight key parallels and divergences between biological and artificial systems in representation, inference, and confidence calibration. The analysis motivates future neuro-symbolic architectures that unify structured symbolic reasoning with connectionist representations. Such architectures, informed by principles of embodiment, explainability, and cognitive alignment, offer a path toward AI systems that are not only performant but also interpretable and cognitively grounded.

Paper Structure

This paper contains 43 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the experimental setup: the same raw input image is processed by a deep learning model (left) and human participants (right). AI predicts a label based on learned hierarchical features and weights, while human responses are analyzed through dimensions such as confidence, strategy, emotion, and trust.
  • Figure 2: Left: Full set of 10 CIFAR‑10 images used in the study. Center: Selected Grad‑CAM overlays for three representative stimuli. Right: Screenshot of AI ResNet‑18 prediction table (file, predicted label, confidence, correctness).
  • Figure 3: Schematic overview of the labeling pipeline in a deep CNN: from raw input to category decision, with softmax-based classification and CAM-based visual explanation.
  • Figure 4: Human visual recognition and decision model: biological and computational analogues across brain regions and artificial networks.