Human and AI Perceptual Differences in Image Classification Errors
Minghao Liu, Jiaheng Wei, Yang Liu, James Davis
TL;DR
The paper investigates perceptual differences between human and machine classifiers in image classification beyond overall accuracy. It analyzes confusion matrices and partitions task difficulty using machine confidence, machine agreement, and human effort, supplemented by hypothesis testing and collaboration experiments. The findings show machines tend to make similar mistakes across models, while humans exhibit different error patterns, and that human-machine collaboration can achieve higher accuracy than either alone, including in ideal oracle and threshold-based realistic settings. The work highlights practical implications for designing hybrid AI systems, cautions against assuming machine likeness to human perception, and points to potential benefits in high-stakes domains such as medical imaging and cognitive modeling.
Abstract
Artificial intelligence (AI) models for computer vision trained with supervised machine learning are assumed to solve classification tasks by imitating human behavior learned from training labels. Most efforts in recent vision research focus on measuring the model task performance using standardized benchmarks such as accuracy. However, limited work has sought to understand the perceptual difference between humans and machines. To fill this gap, this study first analyzes the statistical distributions of mistakes from the two sources and then explores how task difficulty level affects these distributions. We find that even when AI learns an excellent model from the training data, one that outperforms humans in overall accuracy, these AI models have significant and consistent differences from human perception. We demonstrate the importance of studying these differences with a simple human-AI teaming algorithm that outperforms humans alone, AI alone, or AI-AI teaming.
