Table of Contents
Fetching ...

Generalization of CNNs on Relational Reasoning with Bar Charts

Zhenxing Cui, Lu Chen, Yunhai Wang, Daniel Haehn, Yong Wang, Hanspeter Pfister

TL;DR

This paper investigates how CNNs and humans generalize to relational reasoning with bar charts, focusing on robustness to variations in visualization design. It revisits prior work, expands the stimulus space with standard Vega-Lite visualizations to create the GRAPE dataset, and conducts IID and OOD evaluations of CNNs versus humans. The findings show CNNs can match or exceed human performance when training and test encodings align, but their generalization deteriorates under perturbations, whereas humans are more robust and rely primarily on bar lengths. The work introduces Grad-CAM analyses and segmentation-masked improvements, highlighting the need for task-oriented attention and future exploration of transformers and AutoML to improve robust relational reasoning in visualizations.

Abstract

This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.

Generalization of CNNs on Relational Reasoning with Bar Charts

TL;DR

This paper investigates how CNNs and humans generalize to relational reasoning with bar charts, focusing on robustness to variations in visualization design. It revisits prior work, expands the stimulus space with standard Vega-Lite visualizations to create the GRAPE dataset, and conducts IID and OOD evaluations of CNNs versus humans. The findings show CNNs can match or exceed human performance when training and test encodings align, but their generalization deteriorates under perturbations, whereas humans are more robust and rely primarily on bar lengths. The work introduces Grad-CAM analyses and segmentation-masked improvements, highlighting the need for task-oriented attention and future exploration of transformers and AutoML to improve robust relational reasoning in visualizations.

Abstract

This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.

Paper Structure

This paper contains 19 sections, 5 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Performance comparison of eight network architectures trained with eight sets of hyper-parameters in type 1 of the position-length experiment. The best-trained model for each kind of network is highlighted and labeled with the corresponding MLAE value, while others are shaded.
  • Figure 2: Sample images of expanding the parametric space in three steps: (a) from 10 fixed length values to 20 fixed values, (b) random length values, and (c) from fixed indices of target bars to random indices; (d) control settings and MLAE values and CIs of three extended steps in (a-c)
  • Figure 3: Our five types of stimuli used in the position-length experiment, where each bar chart includes colorized bars, axes, tick labels, and titles.
  • Figure 4: An exemplar training stimulus (a) of type-1 and the test stimulus (b-j) generated by perturbing eight different parameters of one bar chart of type-1 at one specific level.
  • Figure 5: The mean MLAE values produced by CNNs on performing generalization tests of eight parameters on five types of bar charts. (a-h) Each curve shows how the MLAE values change by increasing or decreasing the corresponding parameter values. The dotted lines indicate the MLAE value computed for the stimuli with non-perturbed parameters. (i) Estimated MLAE values for bar charts with bar lengths encoded by different value ranges.
  • ...and 5 more figures