Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations

Ratanond Koonchanok; Michael E. Papka; Khairi Reda

Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations

Ratanond Koonchanok, Michael E. Papka, Khairi Reda

TL;DR

The paper investigates how people infer parameters of known data-generating processes from noisy bivariate visualizations and compares human inferences to Bayesian benchmarks. It introduces a graphical elicitation method to externalize priors and posteriors on the parameters $\mu$ and $\sigma$, and tests three visualization types across varying sample sizes and extremeness in two experiments. Results show that humans are generally less accurate than Bayesian agents, but can outperform them in extreme-sample conditions, especially with icon array displays, and are more resilient to spurious data at the cost of higher variability and overconfidence. The findings motivate human-machine collaboration in visual analytics, suggesting tool designs that leverage analyst intuition alongside formal statistical models to improve inference and decision-making.

Abstract

People commonly utilize visualizations not only to examine a given dataset, but also to draw generalizable conclusions about the underlying models or phenomena. Prior research has compared human visual inference to that of an optimal Bayesian agent, with deviations from rational analysis viewed as problematic. However, human reliance on non-normative heuristics may prove advantageous in certain circumstances. We investigate scenarios where human intuition might surpass idealized statistical rationality. In two experiments, we examine individuals' accuracy in characterizing the parameters of known data-generating models from bivariate visualizations. Our findings indicate that, although participants generally exhibited lower accuracy compared to statistical models, they frequently outperformed Bayesian agents, particularly when faced with extreme samples. Participants appeared to rely on their internal models to filter out noisy visualizations, thus improving their resilience against spurious data. However, participants displayed overconfidence and struggled with uncertainty estimation. They also exhibited higher variance than statistical machines. Our findings suggest that analyst gut reactions to visualizations may provide an advantage, even when departing from rationality. These results carry implications for designing visual analytics tools, offering new perspectives on how to integrate statistical models and analyst intuition for improved inference and decision-making. The data and materials for this paper are available at https://osf.io/qmfv6

Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations

TL;DR

and

, and tests three visualization types across varying sample sizes and extremeness in two experiments. Results show that humans are generally less accurate than Bayesian agents, but can outperform them in extreme-sample conditions, especially with icon array displays, and are more resilient to spurious data at the cost of higher variability and overconfidence. The findings motivate human-machine collaboration in visual analytics, suggesting tool designs that leverage analyst intuition alongside formal statistical models to improve inference and decision-making.

Abstract

Paper Structure (33 sections, 4 equations, 10 figures, 1 table)

This paper contains 33 sections, 4 equations, 10 figures, 1 table.

Introduction
Background & Related Work
Heuristic vs. Rational Decision-Making
Inference from Visualizations
Research Questions & Methods
Model Elicitation
Stimuli and Data-Generating Models
Sample Configurations
Experiment I
Experiment Design
Hypotheses
Participants
Procedure
Response and Accuracy Metrics
Analysis and Modeling
...and 18 more sections

Figures (10)

Figure 1: A graphical belief elicitation device for expressing beliefs about bivariate relationships in response to a prompt (top). Participants externalized their prior and posterior beliefs in two steps: (1) indicating the most likely relationship, and (2) specifying their uncertainty in the relationship. These slider settings update two parameters, $\mu$ and $\sigma$, in a linear model. During this interaction, participants see hypothetical samples from this model, refreshed at 5Hz, illustrating what the bivariate data might look like if their beliefs were true.
Figure 2: Distribution of extremeness for 'medium' sample sizes ($N=15$ points). With $\Delta R \approx 0.1$, the middle sample is a relatively faithful depiction of the ground truth (no correlation in this example). The sample on the left is more extreme with $\Delta R \approx -0.3$. The right-most scatterplot represents an even more extreme occurrence with $\Delta R \approx 0.6$.
Figure 3: Top: Flow diagram illustrating the procedure for Exp. 1. The blue-shaded section represents a single trial. In each trial, participants: 1) externalize their prior belief about a prompt question by adjusting two sliders, 2) observe a sample from the ground truth presented alongside their belief, 3) indicate how reliable they believe the sample is, and 4) specify their updated (posterior) belief by re-adjusting the sliders. Depending on the visualization condition, participants are either shown scatterplots, bar charts (A), or icon arrays (B).
Figure 4: Estimated mean divergence ($\pm$ 95% credible intervals) for participants vs. informed and uninformed (flat prior) agents. Smaller divergence from zero indicates better accuracy at inferring the true slope. Right: marginalized effects of Visualization, Consensus, and Sample size.
Figure 5: Estimated spread (standard deviation) of $\Delta \mu$ ($\pm$ 95% CIs) for participants (red) vs. informed and uninformed Bayesians. A larger spread implies higher variability in inference accuracy. The plot on the right shows the marginalized effects of visualization type.
...and 5 more figures

Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations

TL;DR

Abstract

Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations

Authors

TL;DR

Abstract

Table of Contents

Figures (10)