Generalization of CNNs on Relational Reasoning with Bar Charts
Zhenxing Cui, Lu Chen, Yunhai Wang, Daniel Haehn, Yong Wang, Hanspeter Pfister
TL;DR
This paper investigates how CNNs and humans generalize to relational reasoning with bar charts, focusing on robustness to variations in visualization design. It revisits prior work, expands the stimulus space with standard Vega-Lite visualizations to create the GRAPE dataset, and conducts IID and OOD evaluations of CNNs versus humans. The findings show CNNs can match or exceed human performance when training and test encodings align, but their generalization deteriorates under perturbations, whereas humans are more robust and rely primarily on bar lengths. The work introduces Grad-CAM analyses and segmentation-masked improvements, highlighting the need for task-oriented attention and future exploration of transformers and AutoML to improve robust relational reasoning in visualizations.
Abstract
This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.
