Table of Contents
Fetching ...

VMC: A Grammar for Visualizing Statistical Model Checks

Ziyang Guo, Alex Kale, Matthew Kay, Jessica Hullman

TL;DR

This work tackles the challenge of designing effective graphical model checks by introducing VMC, a high-level grammar that decomposes model-check visualizations into four components: Sampling_spec, Data_transformation, Visual_representation, and Comparative_layout. Implemented as an R package, VMC enables concise specification, supports both Bayesian and non-Bayesian models, and outputs ggplot2 plots, while enabling exploration across canonical and novel checks with reduced code compared to existing tools. The authors validate VMC by reproducing canonical checks and show reduced edit distance relative to ggplot2 and bayesplot, complemented by an interview study with three expert modelers that demonstrates alignment with practice and potential to foster exploration. Collectively, VMC offers a practical, extensible framework to broaden the use and effectiveness of graphical model checks in statistics and data science, with implications for education and tooling in exploratory and confirmatory workflows.

Abstract

Visualizations play a critical role in validating and improving statistical models. However, the design space of model check visualizations is not well understood, making it difficult for authors to explore and specify effective graphical model checks. VMC defines a model check visualization using four components: (1) samples of distributions of checkable quantities generated from the model, including predictive distributions for new data and distributions of model parameters; (2) transformations on observed data to facilitate comparison; (3) visual representations of distributions; and (4) layouts to facilitate comparing model samples and observed data. We contribute an implementation of VMC as an R package. We validate VMC by reproducing a set of canonical model check examples, and show how using VMC to generate model checks reduces the edit distance between visualizations relative to existing visualization toolkits. The findings of an interview study with three expert modelers who used VMC highlight challenges and opportunities for encouraging exploration of correct, effective model check visualizations.

VMC: A Grammar for Visualizing Statistical Model Checks

TL;DR

This work tackles the challenge of designing effective graphical model checks by introducing VMC, a high-level grammar that decomposes model-check visualizations into four components: Sampling_spec, Data_transformation, Visual_representation, and Comparative_layout. Implemented as an R package, VMC enables concise specification, supports both Bayesian and non-Bayesian models, and outputs ggplot2 plots, while enabling exploration across canonical and novel checks with reduced code compared to existing tools. The authors validate VMC by reproducing canonical checks and show reduced edit distance relative to ggplot2 and bayesplot, complemented by an interview study with three expert modelers that demonstrates alignment with practice and potential to foster exploration. Collectively, VMC offers a practical, extensible framework to broaden the use and effectiveness of graphical model checks in statistics and data science, with implications for education and tooling in exploratory and confirmatory workflows.

Abstract

Visualizations play a critical role in validating and improving statistical models. However, the design space of model check visualizations is not well understood, making it difficult for authors to explore and specify effective graphical model checks. VMC defines a model check visualization using four components: (1) samples of distributions of checkable quantities generated from the model, including predictive distributions for new data and distributions of model parameters; (2) transformations on observed data to facilitate comparison; (3) visual representations of distributions; and (4) layouts to facilitate comparing model samples and observed data. We contribute an implementation of VMC as an R package. We validate VMC by reproducing a set of canonical model check examples, and show how using VMC to generate model checks reduces the edit distance between visualizations relative to existing visualization toolkits. The findings of an interview study with three expert modelers who used VMC highlight challenges and opportunities for encouraging exploration of correct, effective model check visualizations.
Paper Structure (26 sections, 12 equations, 4 figures)

This paper contains 26 sections, 12 equations, 4 figures.

Figures (4)

  • Figure 1: The formal specification of VMC including the four main components: Sampling_spec, Data_transformation, Visual_representation and Comparative_layout.
  • Figure 2: A subset of visual representations that VMC can specify. Not shown: Animation (HOPs) can be applied upon any.
  • Figure 3: The model check examples for updated model.
  • Figure 4: (A) Raincloud plots are constructed by two visual representation layers for model predictions and one for the observed data. (B) A line + ribbon plot is a combination of a line showing the mean of predictions and ribbons showing the prediction intervals. (C) HOPs uses animation to combine the samples from the model.