Table of Contents
Fetching ...

Guided Statistical Workflows with Interactive Explanations and Assumption Checking

Yuqi Zhang, Adam Perer, Will Epperson

TL;DR

GuidedStats tackles the risk of misusing statistical methods by embedding guided, stepwise workflows and automatic assumption checks inside computational notebooks. The approach provides two workflows, Linear Regression and Two-Sample T-Test, with explanations, visualizations, and exportable code to enable iterative refinement when assumptions fail. Case studies demonstrate how the system surfaces assumption violations (e.g., outliers, multicollinearity, unequal variances) and offers actionable steps, improving interpretability and reproducibility with quantitative metrics such as $R^2$ and $t$ statistics. Overall, the paper presents a practical framework that integrates GUI guidance with code-based analysis, and outlines future work to expand workflows and interfaces for richer statistical practice.

Abstract

Statistical practices such as building regression models or running hypothesis tests rely on following rigorous procedures of steps and verifying assumptions on data to produce valid results. However, common statistical tools do not verify users' decision choices and provide low-level statistical functions without instructions on the whole analysis practice. Users can easily misuse analysis methods, potentially decreasing the validity of results. To address this problem, we introduce GuidedStats, an interactive interface within computational notebooks that encapsulates guidance, models, visualization, and exportable results into interactive workflows. It breaks down typical analysis processes, such as linear regression and two-sample T-tests, into interactive steps supplemented with automatic visualizations and explanations for step-wise evaluation. Users can iterate on input choices to refine their models, while recommended actions and exports allow the user to continue their analysis in code. Case studies show how GuidedStats offers valuable instructions for conducting fluid statistical analyses while finding possible assumption violations in the underlying data, supporting flexible and accurate statistical analyses.

Guided Statistical Workflows with Interactive Explanations and Assumption Checking

TL;DR

GuidedStats tackles the risk of misusing statistical methods by embedding guided, stepwise workflows and automatic assumption checks inside computational notebooks. The approach provides two workflows, Linear Regression and Two-Sample T-Test, with explanations, visualizations, and exportable code to enable iterative refinement when assumptions fail. Case studies demonstrate how the system surfaces assumption violations (e.g., outliers, multicollinearity, unequal variances) and offers actionable steps, improving interpretability and reproducibility with quantitative metrics such as and statistics. Overall, the paper presents a practical framework that integrates GUI guidance with code-based analysis, and outlines future work to expand workflows and interfaces for richer statistical practice.

Abstract

Statistical practices such as building regression models or running hypothesis tests rely on following rigorous procedures of steps and verifying assumptions on data to produce valid results. However, common statistical tools do not verify users' decision choices and provide low-level statistical functions without instructions on the whole analysis practice. Users can easily misuse analysis methods, potentially decreasing the validity of results. To address this problem, we introduce GuidedStats, an interactive interface within computational notebooks that encapsulates guidance, models, visualization, and exportable results into interactive workflows. It breaks down typical analysis processes, such as linear regression and two-sample T-tests, into interactive steps supplemented with automatic visualizations and explanations for step-wise evaluation. Users can iterate on input choices to refine their models, while recommended actions and exports allow the user to continue their analysis in code. Case studies show how GuidedStats offers valuable instructions for conducting fluid statistical analyses while finding possible assumption violations in the underlying data, supporting flexible and accurate statistical analyses.
Paper Structure (10 sections, 2 figures, 1 table)

This paper contains 10 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: GuidedStats supports an interactive loop of assumption checking, editing the data, then re-verifying assumptions.
  • Figure 2: For assumption checking steps, like the homogeneity of variance in the T-test workflow, GuidedStats recommends potential actions based on the results of assumption checks. In this example, the check suggests the variances are equal and the user can select the action to pre-set this parameter to True in the later model specification step.