Table of Contents
Fetching ...

Striking a Balance: Evaluating How Aggregations of Multiple Forecasts Impact Judgment Under Uncertainty

Ruishi Zou, Siyi Wu, Racquel Fygenson, Bingsheng Yao, Dakuo Wang, Lace Padilla

TL;DR

This paper investigates how different levels of partial aggregation of multiple forecasts affect judgment under uncertainty. Using real-world COVID-19 mortality forecast data, it evaluates eight visualization designs across two large online experiments (14 judgment metrics) to determine how aggregation influences performance, surprise, trust, and perceived effort. The key finding is that Horizon Sampled MFV most consistently improves predictive accuracy and lowers surprise, while no single design excels across all metrics; partial aggregation broadens the design space and offers tailored trade-offs for communication goals. The work advances uncertainty visualization by demonstrating that partial aggregation can achieve a practical balance between expressiveness and interpretability in public-facing forecasting contexts.

Abstract

Decision-makers consult multiple forecasts to account for uncertainties when forming judgments about future events. While prior works have compared unaggregated and highly-aggregated designs for displaying multiple forecasts (e.g., Multiple Forecast Visualizations versus confidence interval plots), it remains unclear how partial aggregation impacts judgment. To investigate the effect of partial aggregation, we curated three designs that partially aggregate multiple forecasts. Through two large-scale studies (Experiment 1 n = 695 and Experiment 2 n = 389) across 14 judgment-related metrics, we observed that one design (Horizon Sampled MFV) significantly enhanced participants' ability to predict future trends, thereby reducing their surprise when confronted with the actual outcomes. Grounded in empirical evidence, we provide insights into how to design visualizations for multiple forecasts to communicate uncertainty more effectively. Specifically, since no approach excels in all metrics, we advise choosing different designs based on communication goals and prior knowledge of forecasts.

Striking a Balance: Evaluating How Aggregations of Multiple Forecasts Impact Judgment Under Uncertainty

TL;DR

This paper investigates how different levels of partial aggregation of multiple forecasts affect judgment under uncertainty. Using real-world COVID-19 mortality forecast data, it evaluates eight visualization designs across two large online experiments (14 judgment metrics) to determine how aggregation influences performance, surprise, trust, and perceived effort. The key finding is that Horizon Sampled MFV most consistently improves predictive accuracy and lowers surprise, while no single design excels across all metrics; partial aggregation broadens the design space and offers tailored trade-offs for communication goals. The work advances uncertainty visualization by demonstrating that partial aggregation can achieve a practical balance between expressiveness and interpretability in public-facing forecasting contexts.

Abstract

Decision-makers consult multiple forecasts to account for uncertainties when forming judgments about future events. While prior works have compared unaggregated and highly-aggregated designs for displaying multiple forecasts (e.g., Multiple Forecast Visualizations versus confidence interval plots), it remains unclear how partial aggregation impacts judgment. To investigate the effect of partial aggregation, we curated three designs that partially aggregate multiple forecasts. Through two large-scale studies (Experiment 1 n = 695 and Experiment 2 n = 389) across 14 judgment-related metrics, we observed that one design (Horizon Sampled MFV) significantly enhanced participants' ability to predict future trends, thereby reducing their surprise when confronted with the actual outcomes. Grounded in empirical evidence, we provide insights into how to design visualizations for multiple forecasts to communicate uncertainty more effectively. Specifically, since no approach excels in all metrics, we advise choosing different designs based on communication goals and prior knowledge of forecasts.

Paper Structure

This paper contains 32 sections, 1 equation, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Generation process for a Horizon Sampled MFV. We encode clusters (C1-C8) generated at the forecast horizon with color and label the selected forecast path in dark blue.
  • Figure 2: Generation process for Progressively Sampled MFVs. In step 1, we cluster predictions at each time point ($t_1$-$t_4$). In step 2, we "Sankify" the clusters by connecting clusters where there exists an original forecast path. In step 3 we add visual encoding and embellishments to describe the number of forecasts represented by each cluster and the spread of each cluster's forecasts.
  • Figure 3: The visualizations mapped into the aggregation level spectrum: from MFV (no aggregation) to Confididence Interval Plot (full aggregation). We also include a Mean-only Plot to represent an option with no uncertainty representation.
  • Figure 4: An example stimulus used in the study. The content in the purple box corresponds to a different visualization design.
  • Figure 5: Experimental procedure of Experiment 1. Participants completed the tasks for one visualization design (between-subject condition) while responding to visualizations created from five sub-datasets (within-subject condition, repeated measures) and answered five survey questions. The questions were presented in the order of EQ1 to EQ5, with EQ5 shown last to prevent disclosing the outcomes to participants.
  • ...and 4 more figures