Table of Contents
Fetching ...

PriorWeaver: Prior Elicitation via Iterative Dataset Construction

Yuwei Xiao, Shuai Ma, Antti Oulasvirta, Eunice Jun

TL;DR

PriorWeaver tackles the difficulty of prior elicitation by reframing it as an iterative dataset-construction task in observable space, enabling domain experts to express distributions and relationships through coordinated visualizations. It derives priors by bootstrapping the analyst-constructed dataset and fitting a predefined model, with prior predictive checks delivering actionable feedback to guide refinement. In a lab study with Bayesian novices, PriorWeaver produced priors that were more aligned with analysts' beliefs and increased willingness to adopt Bayesian methods compared with a parameter-space baseline. The work demonstrates that interactive dataset construction can lower barriers to Bayesian analysis and suggests design paths for expanding to richer models and feedback in real-world workflows.

Abstract

In Bayesian analysis, prior elicitation, or the process of explicating one's beliefs to inform statistical modeling, is an essential yet challenging step. Analysts often have beliefs about real-world variables and their relationships. However, existing tools require analysts to translate these beliefs and express them indirectly as probability distributions over model parameters. We present PriorWeaver, an interactive visualization system that facilitates prior elicitation through iterative dataset construction and refinement. Analysts visually express their assumptions about individual variables and their relationships. Under the hood, these assumptions create a dataset used to derive statistical priors. Prior predictive checks then help analysts compare the priors to their assumptions. In a lab study with 17 participants new to Bayesian analysis, we compare PriorWeaver to a baseline incorporating existing techniques. Compared to the baseline, PriorWeaver gave participants greater control, clarity, and confidence, leading to priors that were better aligned with their expectations.

PriorWeaver: Prior Elicitation via Iterative Dataset Construction

TL;DR

PriorWeaver tackles the difficulty of prior elicitation by reframing it as an iterative dataset-construction task in observable space, enabling domain experts to express distributions and relationships through coordinated visualizations. It derives priors by bootstrapping the analyst-constructed dataset and fitting a predefined model, with prior predictive checks delivering actionable feedback to guide refinement. In a lab study with Bayesian novices, PriorWeaver produced priors that were more aligned with analysts' beliefs and increased willingness to adopt Bayesian methods compared with a parameter-space baseline. The work demonstrates that interactive dataset construction can lower barriers to Bayesian analysis and suggests design paths for expanding to richer models and feedback in real-world workflows.

Abstract

In Bayesian analysis, prior elicitation, or the process of explicating one's beliefs to inform statistical modeling, is an essential yet challenging step. Analysts often have beliefs about real-world variables and their relationships. However, existing tools require analysts to translate these beliefs and express them indirectly as probability distributions over model parameters. We present PriorWeaver, an interactive visualization system that facilitates prior elicitation through iterative dataset construction and refinement. Analysts visually express their assumptions about individual variables and their relationships. Under the hood, these assumptions create a dataset used to derive statistical priors. Prior predictive checks then help analysts compare the priors to their assumptions. In a lab study with 17 participants new to Bayesian analysis, we compare PriorWeaver to a baseline incorporating existing techniques. Compared to the baseline, PriorWeaver gave participants greater control, clarity, and confidence, leading to priors that were better aligned with their expectations.

Paper Structure

This paper contains 17 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: PriorWeaver's user interface. (a) An information panel displays the model formula, variables, and parameters. To externalize their knowledge for priors, analysts work in the central coordinated visualizations panel, which includes (b) univariate histograms for variable distributions, (c) bivariate scatterplots for pairwise relationships, and (d) a parallel coordinates plot for multivariate relationships. (f) Brushing on the parallel coordinates plots' axes serves as a cross-filter, with selections (blue dots) synchronized across all visualizations. (e) Analysts can toggle between displaying complete or incomplete entities (white dots), and hide the others (gray dots). In Complete mode, they can use the generate function to define multivariate assumptions within brushed regions and add the generated entities to the constructed dataset. To derive and evaluate the priors, analysts can click translate to view (g) prior predictive checks and (h) suggested prior distributions.
  • Figure 2: Building relationships across multiple variables using the parallel coordinates plot. (a) When analysts select the incomplete mode, PriorWeaver displays only the incomplete entities (white dots) and hide the complete entities (gray dots). When analysts brush regions on axes to select and connect entities, PriorWeaver automatically identifies the maximum possible connections and previews (b) potential connections (orange dashed lines) and (c) corresponding potential entities (orange dots). Analysts can then click on connect to establish connections and merge these entities in the underlying dataset under construction.
  • Figure 3: Interactive dataset construction. As analysts interact with the visualizations to externalize their knowledge, PriorWeaver simultaneously constructs a dataset that represents this knowledge behind the scenes. The constructed dataset embodies analysts’ knowledge in two dimensions: columns record distributional assumptions about each variable, while rows link these values together, reflecting relational knowledge across variables.
  • Figure 4: Deriving priors.PriorWeaver derives priors in three steps. (1) First, generate multiple datasets by sampling with replacement from the analyst-constructed dataset. (2) Next, fit the pre-specified statistical model to each bootstrapped dataset and obtain parameter estimates. (3) Finally, aggregate these estimates across samples and smooth them to form continuous prior distributions.
  • Figure 5: Feedback through prior predictive checks.PriorWeaver samples predictor values (e.g., age, education) from the analyst-constructed dataset (top left), draws parameter sets (e.g., $\alpha$, $\beta$, $\sigma$) from the derived priors (bottom left), and combines them to generate predictive distributions (right). On the rigth, each predictive distribution is shown as a faded blue line, with the average depicted as a solid blue line. Analysts can detect discrepancies between these predictive distributions and the histogram of the response variable (e.g., income) (top right).