Table of Contents
Fetching ...

Using Counterfactuals to Improve Causal Inferences from Visualizations

David Borland, Arran Zeyu Wang, David Gotz

TL;DR

Visual data explorations often lead users to infer causality from correlations, risking invalid conclusions. The paper advocates counterfactual reasoning as a scalable approach to support visual causal inference and introduces the CoFact framework, which defines included, excluded, and counterfactual subsets to compare outcomes. It discusses limitations of traditional graph-based causal models (DAGs/DCGs) in large, complex data and outlines open challenges and future directions, including cognitive modeling, communication of counterfactuals, and measures for subset quality. The work emphasizes practical implications for designing visualization tools that help users make more credible causal judgments in data-rich settings.

Abstract

Traditional approaches to data visualization have often focused on comparing different subsets of data, and this is reflected in the many techniques developed and evaluated over the years for visual comparison. Similarly, common workflows for exploratory visualization are built upon the idea of users interactively applying various filter and grouping mechanisms in search of new insights. This paradigm has proven effective at helping users identify correlations between variables that can inform thinking and decision-making. However, recent studies show that consumers of visualizations often draw causal conclusions even when not supported by the data. Motivated by these observations, this article highlights recent advances from a growing community of researchers exploring methods that aim to directly support visual causal inference. However, many of these approaches have their own limitations which limit their use in many real-world scenarios. This article therefore also outlines a set of key open challenges and corresponding priorities for new research to advance the state of the art in visual causal inference.

Using Counterfactuals to Improve Causal Inferences from Visualizations

TL;DR

Visual data explorations often lead users to infer causality from correlations, risking invalid conclusions. The paper advocates counterfactual reasoning as a scalable approach to support visual causal inference and introduces the CoFact framework, which defines included, excluded, and counterfactual subsets to compare outcomes. It discusses limitations of traditional graph-based causal models (DAGs/DCGs) in large, complex data and outlines open challenges and future directions, including cognitive modeling, communication of counterfactuals, and measures for subset quality. The work emphasizes practical implications for designing visualization tools that help users make more credible causal judgments in data-rich settings.

Abstract

Traditional approaches to data visualization have often focused on comparing different subsets of data, and this is reflected in the many techniques developed and evaluated over the years for visual comparison. Similarly, common workflows for exploratory visualization are built upon the idea of users interactively applying various filter and grouping mechanisms in search of new insights. This paradigm has proven effective at helping users identify correlations between variables that can inform thinking and decision-making. However, recent studies show that consumers of visualizations often draw causal conclusions even when not supported by the data. Motivated by these observations, this article highlights recent advances from a growing community of researchers exploring methods that aim to directly support visual causal inference. However, many of these approaches have their own limitations which limit their use in many real-world scenarios. This article therefore also outlines a set of key open challenges and corresponding priorities for new research to advance the state of the art in visual causal inference.
Paper Structure (17 sections, 3 figures)

This paper contains 17 sections, 3 figures.

Figures (3)

  • Figure 1: Example causal graphs: (a) a simple causal graph in which social media use decreases happiness, (b) a graph with a confounder, in which the apparent causal relationship between social media use and happiness is in fact caused by a third factor, number of coworker friends, that has a causal effect on both, and (c) a graph with a collider, in which the apparent causal relationship between social media use and happiness is due to both having a causal effect on this third factor. In this case, the collider also exhibits a cycle.
  • Figure 2: The CoFact visualization system kaul_improving_2021 leverages counterfactuals to help better communicate the relationship between variables of interest. In this figure's example, the user has applied a filter constraint on square footage (a) to a multidimensional house sales dataset. In response, they are shown the resulting included, counterfactual, and excluded subsets (b), along with their corresponding distributions for a selected outcome feature of interest: house sale price (c). Additional feature-to-outcome relationships can be explored with supplementary visualizations (d–i). The tool supports comparisons between an included subset (data points that match user-specified inclusion criteria) and a counterfactual subset containing similar data points selected from those data points that do not meet the inclusion criteria.
  • Figure 3: A counterfactual subset contains data points from the excluded set that are the most similar to those in the included set. Prior work kaul_improving_2021 has shown that a visualization that allows users to compare the counterfactual subset against the included subset (c) supports more accurate causal inferences compared to a more traditional approach (b).